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PCT/US96/02861 

SUBTILISIN VARIANTS CAPABLE OF CLEAVING SUBSTRATES 
CONTAINING BASIC RESIDUES 



HELD OF THE INVENTION 
This invention relates to subtilisin variants having altered specificity from wild-type subtilisins. 
5 Specifically, the subtilisin variants are modified so that they efficiently and selectively cleave substrates 
containing basic residues. The invention further relates to the DNA encoding these novel polypeptides, as well 
as the recombinant materials and methods for producing &ese subtilisin variants. In a particular aspect, the 
present invention provides for processes for cleaving protein substrates containing basic residues. 

BACKGROUND OF THE INVENTION 

a 0 Site-specific proteolysis is one of the most common fonns of post*transla:ional modificai ions of proteins 

(for review see Neurath, H. (1989) Trends Biochem. Sci., 14:268). In addition, proteolysis of fusion proteins m 
virro is an important research and commercial tool (for reviews see Uhlen. M. and Moks, T. (1990) Methods 
Enzymol., 185:129-143; Carter, (1990) in Protein Purification: Froro Molecular Mechanisms to Large-Scale 
Processes, M.R. Landisch, R-C. Wilson, CD. Painton, S.E. Builder, Eds. (ACS Symposium Series 427, 

15 American Chemical Society, Washington. D.C.), Chap. 13, p.l8I-I93; and Nilsson, B. et aJ. (1992) Current 
Opin, Struct. BioL, 2:569). Expressing a protein of interest as a fusion protein facilitates purification when the 
fusion contains an affmlty domain such as glutathione-S-transferase. Protein A or a poly-histidine tail. The 
fusion domain can also facilitate high level expression and/or secretion. 

To liberate the protein produa from the fusion domain requires selective and efficient cleavage of the 

20 fusion protein. Both chemical end enzymatic methods have been proposed (see references above). Enzymatic 
methods are generally preferred as they tend to be more specific and can be performed under mild conditions 
tiiat avoid denaturaiion or unwanted chemical side-reactions. A number of natural and even designed enzymes 
have been applied for site-specific proteolysis. Ahhough some are generally more useful than others (Forsberg. 
C, Baastnjp» B., Rondahl, H., Holmgren, E., Pohl, C, Hartraanis, M. and Lake, M. (1992) J. Proi. Chem.. 

25 11:201-21 1), no one is applicable to every situation given the sequence requirements of the fusion protein 
junction and the possible existence of protease sequences whhin the desired protein ptodua. Thus, an expanded 
aiTBy of sequence specific proteases, analogous to restriction endonucleases. would make site-specific proteolysis 
a more widely used method for processing fusion proteins or generating proiein/peptide fragimenls either in vitro 
or in vivo. 

30 The processing of prohormones by the KEX2-relaied family of serine endoproteases illustrates one of 

the most precise proteolytic events found in nature (for reviews sec Steiner, D. F^Smeekens. S. P^ Ohagi. S. and 
Chan, S. J. {1992)J. Biol. aem^ 267, 23435*23436 and Smeekens, S, P. (1993) BioH'echnology IK 182-186). 
This family of proteases, that includes the yeast SCEX2 and the mammalian PC2, PC3 and furin enzymes, are 
homologous to the bacterial serine protease subtilisin (Kraut, J. (1977) Annu. Rev. Biochem.., 46:331-358). 

35 Subtilisin has a broad substrate specificity that reflects its role as a scavenger protease, in contrast, these * 
eukaryotic enzymes are very specific for cleaving substrates containing two basic residues and thus well-suited 
for site-specific proteolysis. 
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All of these cucaryotic enzymes strongly require Arg at the PI position, and either Arg. Lys or Pro at 
the P2 position of peptide lubniwcs. The prohormone convcrtascs from higher eukaryoies such as ftirin, PC2. 
and PC3 also have an absolute requirrmcnt for Arg ai the P4 position (Bresnahan, P. A., Lcduc, R., Thomas. L., 
Thomer, K Gibson, H. U Brake, A. J.. Ban. P. J. and Thomas, G. (! 990) J. Cell. Biol. 1 1 1 . 285 1 ; Wise, R. J., 
5 Baar. P. J-. Wong, R A, Kiefcr, M. C, Brake. A. J., and Kaufinan, IL J. ( 1 990) Proc. Natl, Acad. Sci. USA 87, 
9378-93S2.; Hosaka. Nagahama, M., Kim. W.-S„ Wamabe, T., Hatsuzakawa, K., Ikemizu, J.. Murakami. 
IC and Nakayaraa, IL (1991)J. Biol. Chero. 266, 12I27-I2 l30.;Matthews. D. J., Goodman, L. J.. Gorman, C. 
M., and WclU, J. A. (1994) Protein Science 3, 1 197-1205). 

Despite tiie very narrow specificity of the pro-hormone processing enzymes, in some cases they are 
10 capable of rapid cleavage of target sequences. For example, the k^^/Km ratio for KEX2 to cleave a good 
substrto (e.g. acctyl-pMYRK-MCA) is MxlO^ Kf^s'^ (Brenner, C. and Fuller, R.S. (1992) Proc. Natl. Acad, 
Sci. USA , 89:922-926) compared to 3xI0' for subtilisin cleaving a good substrate (e.g. suc-AAPF-pNA) (Estcll, 
D. A., Graycar, T. P., Miller, J. V.. Powers, D. B^ Bumicr, J. P.. Ng, P. G. and Wells, J.A. (1986) Science, 
233:659-663). 

15 However, the eukaryotic proteases are expressed in nnall amounu (Bravo, D. B., Gleason. J. B., 

Sanchez, R. I., Roth, R. A., and Fuller. R. S. (1994) J. Biol. Chem.. 269:25830-25837 and Matthews, D. J., 
Goodman, L. J., Gorman, C. M.. and WclU, J. A. (1994) Protein Science , 3:1197-1205) making them 
impractical to apply presently to processing of fusion prottins in vitro, Subtilisin BPN" however, can be 
cxprttised in large amounts (Wells. J.A., Ferrari, E., Henner. DJ., Estcll, DA. and Chen, E,Y. (1983) NucL 

io Acids Res.. 11:7911-7929) 

Extensive protein engineering studies of subtilism, and especially subtilisin BPN*, have identified 
several residues in the 51 snd S2 active site of the enzyme where amino acid substitutions lead to large changes 
in substrate spccifichy (Wells, J. A., and Estcll, DA„ (1988) Trends Biochem. Sci., 13:291-297; Carter, P.. ct 
al., (1989) PROTErNS:Si™cture, Function, and Genetics, 6:240-248). X-ray crystal structures of subtilisin 

25 containing bound transition state analogues (Wright, C. Aldcn, R. A. and Kraut, J. (1969) Nature , 22 1 :235- 
242; McPhalen, CIA. and James. N.G. (1988) Biochemistry, 27:6582-6598; Bode. W., Papamokos, E., Musil, 
D., Sccmueller. U. and Frnz, M. (1986) EMBO J.. 5:813-818; and Bott R., Uksch. M., Kossiakoff, A., Graycar, 
Tm Katz, B. and Power. S. (1988) J. Biol. Chem^ 263:7895-7906) can be used to locate active site residues that 
art in close proximhy to side chains at key positions in substnie peptides (Wells, J.A.. ( 1 987) Proc. Natl. Acad* 

30 Sci. USA 84:1219-1223). Consideration of electrostatic interactions between charged peptide substrates and 
subtilisin can be used to tailor the substrate binding cleft of the subtilisin BPN' to favor complementary charged 
substrates (Wells, JA.. ct al., (1987) Proc. Natl Acad. Sci., USA, 84:1219-1223). Previous work has shown that 
replacement of residues at position 156 and 166 in the SI binding site of subtilisin BPN' with various charged 
residues leads to improved specificity for complementary charged substntes. 

35 A substantial amount of protein engineering has been applied to the specificity determinants of the S4 

subsite of subtilisin BPN" in efforts to alter specificity for P4 substrates (Eder, J., Rheinneckcr, M., and Fcrsht, 
A. R. (1993) FEES Lett 335. 349-352; Rheinneckcr. M.. Baker. G., Eder, J., and Fershi, A. R. (1993) 
Biochcmisny 32, 1 199-1203; Rheinneckcr, M., Eder, J.JPandcy, P.S., and Fershi, A. R. (1994) Biochemistry 33. 
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22 1-225). However, the mutations introduced consisted entirely of hydrophobic substitutions, thus preserving 
ihe overall hydrophobic substrate preference in the site. 

Previous anempts to introduce, remove or reverse charge specificity in enzyme anive sites have been 
met with considerable difficulty. Tliis has generally been attributed to a Jack of siabiiization of the introduced 
5 charge or enzyme-substrate ion pair complex by the wild-type enzyme environment (Hwang, J.K. and Warshel. 
A. (1988) Nature , 3^:270-272). For example, Siennicke et al (Stennicke, H.R.; Ujje. H.M.; Christensen, U.; 
Remington, S,J.; and Bneddgm (1994) Prot. Eng. 7:91 1-916) made scidic (D/E) mmarions at five residues in 
die Pr binding of carboxypeptidase Y in an attempt to change the PI* preference from Phe to Lyj/Arg. Only the 
L272D and L272E mutaiiais were found to alter the spccifichy in the desired direction, up to 1 ,5»fold preference 

iO in Lys/Arg over Phe, and the others simply resuked in les£ active enzymes having substrate preferences similar 
to wild-type. In the case of trypsin, a protease that is highly specific for basic PI residues, recruicmeni of 
chymotrypsin-like (hydrophobic PI) specificity required not only mutations of the ion pair-forming Asp 1 89 to 
Ser, but also transplantation of two more distant surface loops from chymotrypsin (Graf, L., Jancso, A., Srilagyi, 
L., Hegyi, G.. Pinter, K., Naray-Szabo, G., Hepp, J.. Medzihradszky, JC, and Rutter, W. J„ Proc. Natl. Acad. 

15 Sci. USA (1988) 85:496M965 end Hedstrom, L., Szilagyi. U and Rutier. W, J., Science (1992) 255:1349- 
1253). 

In the present work, we have also verified that relatively low specificity is gained by introducing single 
ion-pairs between enzyme and aubstrme. However, when two or more choice ionic interartions were 
stmuhaneously engineered into subtilistn BPhT, the resulting variants had higher specificity for basic residues 
20 in each of the subsites due to a non additive effect 

Accordingly, h is an object to produce a subtilisin variant with basic specificity for use In processing 
pro-proteins made by recombinant techniques. 

SUMiVURY OFTHE JNVEhTION 

The present invention provides for subtilisin variants with ahered substrate specificity. Preferred 
25 subtilisin varianu are highly specific for the efficient cleavage of substrates containing basic residues. The 
subtilisin varianu have a substrate specificity which is substanoally different frtm the substraie specificity of the 
precursor subtilisin from which the amino acid sequence of the variant is derived. The smino acid sequence of 
the subtilisin variants are derived by the substitution of one or mwe amino acids of a precursor subtilisin amino 
Qcid sequence. 

30 in a preferred aspect of the present invention, the subtilisin variants of the present invention ore specific 

for the cleavage of protein substrates containing basic amino acid residues at substrate positions PK P2 and P4. 
Accord'mg to this aspea of the present invention subtilisin variants having amino acid substitutions at positions 
corresponding to amino acid positions 62, 104 and 166 of subtilisin BPN' produced by Bacillus 
omyloliquefaciens are preferred. Accordingly, subtilisin variants are provided wherein amino ocids 62. 104 and 

35 166 of subtilisin BPN' are substituted with an acidic amino acids. Preferably the acidic amino acid is Asp or Giu, • 
and most preferably Asp. 
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Prefeiiwl subsnBies forthe subtUism variants .ceonling to this aspea of the present invention contain 
either Lys (K) or Arg (R) at substrate positions P2 «,d PI . przctically any residue « P3, and preferably either 
Lys or Arg at P4. and again practically any residue «iP5. Tlius an exemplary good substrate would contain -Asn- 
Aig-Met-Arg-Lys- (SEQ ID NO: 76) at .P5-P4.P3-P2.P1 - respectively. Additionally, good substrates would 
5 not have Pro at PI'. P2'. or P3" nor would He be present at PI*. 

According to a second aspect of the present invention the subtilisin variants arc capable of cleaving 
proteinsubst»teshavingb.5ic.tsidues«positionsPl«idP2. According to this aspect ofthe present invention 
ttbtilisin variants having amino acid substitutions at positions coirtsponding to »nino acid positions 62. and 1 66 
ofsubtilisinBPN' produced by Soci//«5fl«>Jo/i?«f«/e«are^^ 
10 substrate ipecificity for dibasic substr«es have an acidic amino acid residue at residue position 62 of subtilisin 
nanwlly ptoduced by BacUlus amylolique/aciens. In a prtfentd embodiment, the naturally occurring Asn at 
residue position 62 of subtilisin BPN" is preferably substituted with an acidic amino acid residue such as Glu or 
Asp andmostpitfctablyAsp. Tl,e preferitd subtilisin variants, having subsTi«e specificity for substrates havi^^ 
dibasic amino idd residues. «lditionally have «. acidic residue. Asp or Glu. ax residue position 166 of subtllam 
BPN*. Thm. the subtilisin BPN' variant containing substitution of amino acids 62 and 166 with acidic wnino 
acids Glu or Asp are preferred. In p«ticular. a subtilisin variant having amino acid Asp at positions 62 and 166 
is prrfenrd (subtilisin BPN" variant N62D/G166D). "nie subtilisin variants according to this aspect of the 
invention may be used to cleave substrates containing dibasic residues such as fiision proteins with dibasic 
substrate linkers and processing homtones or other proteins (.n yitro or in vivo) that contain dibasic cleavage 
20 sites. 

Prefcnrd substrates for the subtilisin BPN" variant N62D/G166D contain either Lys (K) or Arg (R) at 
substnue positions P2«^dPl.p.»ctically any residue at P3..non<ha:gedhydrophobicr«idue« 
practically any residue at P5. Tlus an exemplary good substrate would conuiin -Asn-Leu-Met-Arg-Lys-CSEQ 
ID NO: 35) at -P5-P4-P3-P2-P1. respectively. Additionally, good substrates would not have Pro at PI'. P2-. or 

25 P3' nor would He be present at PI". 

invention also includes muomt DNA sequences encoding such subtilisin variants. These mutant 
DN A sequences are derived fnm a precursor DNA sequence which encodes a naimlly occurring or recombinant 
precursorsubtilistn. -niemut-tt DNA sequence is derived by modifying the precursor DNA sequence to encode 
the substitution{s) of one or more «ntno acids encoded by the precursor DNA sequence. THese recombinant 
30 DNA sequences encode mutants having an amino acid sequence which docs not exist in nature and a substrate 
specificity which is substantially different from the substrate specificity of the precursor subtilisin encoded by 

the precursor DNA sequence. 

Further the invention includes expression vectors containing such munmt DNA sequences as well as host 
cells transformed with such vectors which are capable of expressing the subtilUin variants. 
3 5 The invention also provides for a process for cleaving a polypeptide such as a fusion protein containing 

a substrate linker represented by the formula: 
P4.P3-P2-PI 

wherein P4 is a basic «nino acid or a large hydrophobic amino acid such as Leu or Met; P3 is «. amino ac.d 
selected from the namrally occurring amino acids: P2 is a basic amino acid: and PI is a basic amino acd. The 
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process includes tiie step of subjecting the polypeptide to the subrilisin variants described herein under conditions 
such that the subtilisin variant cleaves the polypeptide. 

BRIEF DESCRIPTION OF THE FIGURES , 
Figure X. Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
5 site of subtilisin BPK' showing the S2 and S t binding pocket residues subjected to mutagenesis. 

Figure t. Kinetic analysis of SI binding site subtilisin mutants versus substrates having variable PI 
midues. The kinetic constant k^^JKm was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Pro-Xaa-pNa (SEQ ID NO: 69), were Xaa was tys (SEQ ID NO: 
58), Asz (SEQ ID NO: 59), Phe (SEQ ID NO: 56); Met (SEQ ID NO: 60) or Gin (SEQ ID NO: 61 ) (defined lo 
10 bright of the plot). 

Figure 3. Kinetic analysis of S2 binding site subtilisin mutants versus substrates having variable P2 
residues. The kinetic constant k^JKm was determined from plots of initial rBies versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Xaa-Phe-pNa (SEQ ID NO: 70), were Xaa was Lys(SEQ ID NO: 
621 Aig (SEQ ID NO: 64). Ala (SEQ ID NO: 63), Pro (SEQ ID NO: 56). or Asp (SEQ ID NO: 65) (defined on 
15 the right of the plot). 

Figure 4. Kinetic analysis of combined SI and S2 binding site subtilisin mutants versus substrates having 
variable PI and P2 residues. The kinetic constants k^JKm were determined from plots of inhial rates versus 
substrate concentration for the tetrapeptide series succinyl-Ala-Ala-Xaaj-Xaaj-pNa (SEQ ID NO: 71), were 
Xaaj-Xsa, was Lys-Lys (SEQ ID NO: 66). Lys-Arg (SEQ ID NO: 67), Lys-Phe(SEQ ID NO: 62), Pro-Lys (SEQ 
20 ID NO: 58). Pro-Phe (SEQ ID NO: 56), or Ala-Phe (SEQ ID NO: 63) (defined on the right of the plot). 

Figure S, Resuhs of hGH-AP fusion protein assay. hGH-AP fusion proteins were constructed, bound to 
hGHbp-coupIed resin, and treated with 0.5 nM N62D/G166D subtilisin in 20 mM Tris-Cl pH 8.2. Aliquots were 
withdrawn at various times and AP release was monitored by activity assay in comparison to a standard curve. 
Arrows indicate the cleavage site. The rate of cleavage of fusion proteins containing various substaie linkers is 
2S shown. Substrates containing a Pro at position Pr are not cleaved. 

Figure ^3 - 6-SO. (Collectively referred to herein as Fig. 6), DNA sequence of the phagemid pSS5 
containing the N62D/G 166D double mutant subtilisin BPN' gene (SEQ ID NO: 1), and translated amino acid 
sequence for the mutant preprosubtilisin (SEQ ID NO: 2). The pre region is comprised of residues - 1 07 to -78, 
the pro of residues -77 to - I . and the mature enzyme of residues +1 to +275 (SEQ ID NO: 72). Also shown are 
3 D restriction sites recognized by endonudeases that require 6 or more specific bases in succession. 

Fsgare 7, Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
site of subtilisin BPN" showing the SI. S2, and S4 b'md'mg pocket residues subjeaed to muugenesis. 

Figure 8. DNA sequence of the N62D/Y104D/G166D triple mutant (SEQ ID NO:74) as well as the 
translated amino acid sequence (SEQ ID NO:75). The preregion is comprised of residues -107 to -78. the pro 
35 residues -77 to -1 and the mature enzyme +1 to +275. The proregion reflects the changes, A(-4)R/A(-2)K.'Y(-1 )R 
made in the wild-type processing Site to affect expression. 
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DETAILED DESCRIPTION OF THE INVENTION 

^'^""w used in the claims and spccincation are defined « se. fonh below unless othenvise specified. 

m tom «nino Kid or amino acid itsidue. as used herein. lefers to nan«lly occurring L amino acids or 
„,idaes unless othenvise specifidly indicted. Tl.e commonly used one- «>d thrce-lener .bbrtviations for 
^ino .^ids «c use herein (Uhnmga. A. LBiocHenusrry, 2d ed..pp. 71-92. Wor* Publishers. N. Y. (1975)). 
Basic amino acids aie Arg anjj Lys. Acidic amino acids are Asp and Glu. 

Subsn«es are described m triple. or single Jcttrr code as Pn..J^-Pl.PI-.P2-.Pn-. The -Pl" residue refers 
to the poshion proceeding (i.e.. N-terminal to) the scissile peptide bond (ic. between the PI and PV residues) 
of the substrate as defined by Schechter and Berger (Schechter. 1. and Berger. A.. Biochem. B.ophys. Res. 
Commun. 27- 157-162 (1967)). Similarly, the term PI' is used to refer to the position following (i.e.. C-terminal 
tt,),hescissilepeptidebondofthe«ubs.i«e. Increasing numbers tefer to the next consecutive position precedmg 
(e.g.. P2 and P3) «id following (e.g, P2' and P3') the scissile bond. According to the present invemion the 
scissile peptide bond is thai bond that is cleaved by the subtilisin variants of the insunt invention. 
5 -Subtilisins," >ecur»r subtilisin- and the like art bacterial caAonyl hydrolases which generally act to 

cleave peptide bonds of proteins or peptides. As used herein, "subtilisin" me«>s a naturally occurring subtilisin 
or a recombinant subtilisin. A series of naturally occurring subdlisbu are known to be produced and often 
by various bacterial species (Siezcn, RJ.. et al.. (1991) PnKein Engineering 4:7.9-737). Amino ac.d 
sequences of the members of this series are not entirely homologous. However, the subtilisins in th.s senes 
:o exhibitthesameorsimilartypeofproteolyticactivity. m class of serine proteases shares, common ammo 
acid sequence defming a cattlytic triad which distinguishes them from the chymotiypsin related class of serme 
proteases. TTe subtilisins «»d chymotrypsin related serine proteases both have a catalytic triad comprumg 
i^panate histidbe »>d serine, in the subtilbin related proteases the rrWve order of these amino acids, readmg 
from the amino to c«boxy terminus is aspamtc-histidine-serine. In the chymotrypsin related proteases the 
25 relative order, however is bistidine-aspanate-serine. Thus, subdlisins as used herein refer to a serine protease 
having the catalytic triad of subtilisin related proteases. 

Generally, subtilisins are serine endoproteases' having molecular weights of about 27.500 which are 
«ctettd in large amounts 6«n a wide variety of Bacillus species, m protein sequence of subtilisins have been 
determined from at least four different species of Bacillus (Markl«.d. F.S.. ttd. (1971) in The E^cs. ed. 
30 Boyer P.D.. Acad Press. New York. Vol. III. pp. 561-608; and Nedkov. P. e, ol. (1983) Hoppe-Seyler-s Z. 
Physiol Chem. 364:1537-1540). The threeHlimensionalcry$uillographicstn.cturtoffoursubt.l.smshave been 
reponed (BPK from BacUlus «nyloUquefacien.. Hirono et al (19»4) J. Mol. Biol. 178.389^13; subtiUsn 
Carlesberg from Bacillus lickeniformi.. Bode e, .1.. (1986) EMBO I.. 5:813-818; thermitase from 
Thennooctino^ vulgaris, Gros et .1.. (1989) J. MoL Biol. 210:347-367: and pn«einase K from Tnurcch.un, 
35 albun,. Betzel. et aL. (.988) Acu Crystallogr.. B. 44:163-172). Tl.e three dimensional strucmre of subt.l.sm 
BPK(fromfl.«-0'Wi9«^«^)«2.5Aresolution has alsobeenrcponed by Wright, C.S.« 
221:235-242 and Drenth. J. e, cl. (.972) Eur. J. B.octen.. 26:177-181. These s«dies indicate that although . 
subtilisin is genetically unrelated to the mammalian serine proteases. ithasasimilarfoW 
The x-ray crysul structures of subtilisin containing covalently bound peptide inhibitors (Robertas. J.D.. et al. 
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(1972) Biochemistry 1 1:2439-2449), product complexes (Robemis, J.D.. et al. (1972) Biochemistry 11:4293- 
4303), and iransition state analogs (Matthews; D,A., aai (1975)/ Bioi Ckem. 250:7120-7126 and Poulos. 
71^&ed. (1976)/ Bioi Chem. 251:1097-1 103), which have been reported have also provided information 
regarding the aaive site and putaiive substrate binding cleft of subtilisins. In addition, a large number of kinetic 
5 and chemical modification studies have been reponed for subtilisins (Phillip, M., et ai (1983) Moi Cell 
Biochsm. 51:5-32; Svendsen, IB. (1976) Carhberg Ra, Comm. 4 1:237-291 and Marfcland, F.S. Id,) as well 
as at least one feport wherein the side chain of methione at residue 222 of subtilisin was converted by hydrogen 
peroxide to methionine-sulfoxide (Stauffer, D.C.. etal (1965)/ Biol Chem. 244 5333-5338). 

"^Subtilisin variant," "subtilisin mutant* and the like refer to a subtilisin-type serine protease having a 

10 sequence which is not found in nature that is derived fi^ a precursor subtilisin according to the present 
invention. The subtilisin variant has a substrate specificity different from the precursor subtilisin by virtue of 
ammo acid substinitions within the precursor subtilisin amino acid sequence. The term is meant to include 
subtilisin variants in which the DNA sequence encoding the precursor subtilisin is modified to produce a mutant 
DNA sequence which encodes the substitution of one or more amino acids in the naturally occurring subtilisin 

15 amino acid sequence. Suitable methods to produce such modification include those disclosed in U. S. Patent No. 
4,760.025 and 5371.008 and in EPO Publication No. 0130756 and 025 1446. 

A change in substrate specificity is defined as a difference between the Koj/Km ratio of the precursor 
subtilisin and the subtilisin variant. The K,a/Km ratio is a measure of catalytic efficiency. Subtilisin variants 
widi increased or decreased K^^/Km ratios compared to the precursor subtilisin from which they were derived 

20 are described herein. Generally, the objective is to secure a variant having a greater* i,e. numerically larger. 
Kj^/Km ratio for a given substrate. A greater K,o/Km ratio for a particular substrute indicates that the variant 
may be txsed to more efficiently cleave the target substrate. 

The specificity or discrimination between two or more competing substrates is determined by the ratios 
of (Fershi, A.R., (1985) in Enzvme Smicture end Mechanism. W.F. Freeman and Co., N.Y. p. 112). An 

25 increase in K,n/Km ratio for one substrate may be accompanied by a reduction in K,a,/Km ratio for another 
substrate. This shift in substme specificity indicates thai the variant subtilisin with the increased K^^^Km ratio 
for the substrate has utility in cleaving the panicular substrate over the precursor subtilisin in, for example, 
preventing undesirable hydrolysis of a particular substrate in a mixture of substrates. 

In general, for a subtilisin variant to have a useful catalytic efficiency for cleavage of a panicular substrate 

30 the K^j/Km ratio will generally be between 1x10^ KT^s** to about 1 x lo' M"'s'*. More often, the K«/Km ratio 
will be between about I x 10* M"*$'' and I x 10* 

When referring to mutants or variants, the wild type amino acid residue is followed by the residue number 
and the new or substituted amino acid residue. For example, substitution of D for wild type N in residue position 
62 is denominated N62D. 

35 "Subtilisin variants or mutants** are designated in the same manner by using the single letter amino acid 

code for the wild-type residue followed by hs position and the single letter amino acid code of the replacement 
residue, Mukiple mutants arc indicated by component single mutants separated by slashes. Thus the subtilisin 
BPN* variant N62D/G 1 66D is a di-substituted variant in which Asp replaces Asn and Gly at residue positions 
62 and 166, respectively, in wild-type subtilisin BPN'. 
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An «nino acid residue of-« precursor carbonyl hydroiase is "equivalent" to a residue of B. 
anryiolupiefacicns subtilisin if it is either homologous (ie.. coiresponding in position in either primary or teniaiy 
smicture) or analogous to a specific residue or portion of that residue in B. amyloliquefaciens subtilisin (i.e.. 
having the same or similar functional capacity to combine, react, or interact chemically). 
S m order to establish homology to primary structure, the amino acid sequence of a precursor carbonyl 

hydrolase b directly compared to the B. amyloliquefaciens subtilisin primary sequence and panicularly to a set 
of residues known to be invariant in all subtilisins for which the sequences are known (see eg. Figure 5-C in EPO 
025 1446). After aligning the conserved residues, allowing for necessary insertions and deletions in order to 
xMimain alignment (Lc. avoiding the eliminanon of conserved residues through art,itra:y deletion and insenion). 
10 the residues equivalent to particular amino acids in the primao' sequence of fl. amyloliquefaciens subtilisin are 
defmcd. Alignment of conserved residues should conserve 100%of such residues. However, alignment of 
greater than 75V. or as litUe as 50% of conserved residues is also adequate to define equivalent residues. 
Conservation of the catalytic triad. Aip32mis64/Scr22l. is required. 

Equivalent residues homologous at the level of tcrtiaiy smicture for a precursor caitoonyl hydrolase whose 
15 tertiary structure has been detennined by x-ray crystallography, are defined as those for which the atomic 
coordinates of 2 or more of the main chain atoms of a particular amino acid residue of the precursor cari>onyl 
hydrolase and B. amyloliquefacim subtilisin (N onN. CA on CA. C on C. and O on O) are within 0.l3nm and 
prefeiably O.lnm after alignmot. AUgnment is achieved after the best model has been oriented and positioned 
to give the maximum overiap of atomic coordinates of non-hydrogen protein atoms of the cartKJnyl hydrolase 
20 faquestiontothef».«v/o/i?«rf«:i«j«ubtilism. TTie best model is the aystallogiaphic model giving the lowest 
R factor for experimental diffraction data at the highest resolution avulable. 



25 R factor < 



l\Fo(h)\-\Fc(h)\ 
h 

Z\Fo(h)\ 
h 



30 



35 



Equivalent amino acid residues of subtilUin BPK. subtilisin Carslbetg. themiitase and prt)teinase K from tertiary 
structure analysis u provided in. for example. Siczen.'et al.. (1991) ProL Eng. 4:719-737. 

Equivalent residues which are functionally analogous to a specific residue of B. amyloliquefaciens 
subtilisin are defined as those amino acids of the precursor caibonyl hydrolases which may adopt a conformation 
such that they either alter, modify or contribute to protein strucwre. substrate binding or catalysis in a manner 
defined and attributed to a specific residue of the B. amyloliquefaciens subtilisin as described herein. Further, 
they are those residues of the precursor cartjonyl hydrolase (for which a tertiary structure has been obuiined by 
x-ray ciystallography). which occupy an analogous position to the extent that ahhough the main chain atoms of 
the given residue may not satisfy the criteria of equivalence on the basis of occupying a homologous position, 
the atomic coonlinaies of at least two of the side chain atoms of the residue lie within 0.l3nm of the 
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coTTCspondiag side chain atoms otB. amyloiujuefaciens subtilism. The three dimensional structures would be 
aligned as outlined above. 

Some of the residues identified for substitmjon are conserved residues whereas others are not In the case 
of residues which are not conserved, the replacement of one or more amino acids is limited to substitutions which 
5 produce a mutant which h&s an amino acid sequence that does not correspond to one found in nature. In the case 
of conserved residues, such replacements should not resuh in b naturally occurring sequence. The subtilisin 
mutants of the present invention include the mature forms of subtilisin mutants as welt as the pro- and prepro- 
forms of such subtilisin mutants. The prepro-forms are the preferred construction since this facilitates the 
expression, secretion and maruration of the subtilisin mutants. 

3.0 ^'Prosequence" refers to a sequence of amino acids bound to the N>terminal portion of the mature form of 

a subtilisin which when removed results m the appearance of the "mature** form of the subtilisin. Many 
proteolytic enzymes are found in nature as transl&tional proenzyme products and, in the absence of post- 
translational processing, are expressed in this fashion. The preferred prosequence for producing subtilisin 
mutants, specifically subtilisin BFN' mutants, is the putative prosequence of B. offiyiolique/acietis subtilisin 

3.5 ahhough other subtilisin prosequences may be used. For example, when the substrate specificity of the precursor 
subtilisin is altered according to the present invention, this alteration may affect the ability of the variant 
subtilisin to undergo amolyiic cleavage of the nsutrally occurring prosequence. In ortier to affect the expression 
and proper folding of a mature variant subtilisin whose substrate specificity has been ahered, it may be necessary 
to alter the prosequence to correspond to the new or variant substrate specificity. 

20 As an example, the substrate specificity of a particular subtilisin variant N62DA(' 1 04D/G 1 66D is distinct 

fmm fte precursor subtilisin ten which h was derived. The subtilisin variant pr^kn substrates containing basic 
residues at substrate positK^ corr e ^nding to P4, P2, ond PI . According to this a^iea of the present invention, 
^ precursor prosequence which was efficiently amolysed by the precursor subtilisin is altered to correspond 
to Ae substrate specificity of the variant subtilisin. Therefore, for the subtilisin variant N62D/Y 104/G 1 66D the 

25 prosequence would be altered to contain basic residues at positions -4 , -2, and -1 . 

A "signal sequence" or "presequence** refers to any sequence of amino acids bound to the N-terminal 
portion of a subtilisin or to the N-terminal portion of a prosubtilisin which may panicipsie in the secretion of the 
mature or pro forms of the subtilisin. This definition of signal sequence is a functional one, meant to include all 
those amino &ctd sequences, encoded by the K-termtnal portion of die subtilisin gene or other secretable carbonyl 

3 0 hydrolases, which participate in the effectuation of the secretion of subtilisin or other carbonyl hydrolases under 
native condhions. The present invention utilizes such sequences to effea the secretion of the subtilisin mutants 
as defmed herein. 

A "prepro" fonn of a subtilisin mutant consists of the mature form of the subtilisin having a prosequence 
operably linked to the amino-terminus of the subtilisin and a "pre" or "signal" sequence operably linked to the 
35 amino terminus of the prosequence. 

""Expression vector** refers to a DNA conscrua containing a DNA sequence which is operably linked to 
a suitable control sequence capable of effecting the expression of the DMA in a suitable host. Such control 
sequences include a promoter to effect transcription, an optional operator sequence to control such transcription, 
a sequence encoding suitable mRKA ribosome binding sites, and sequences which control termination of 
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o,^nan«.dmmslarion. vecu,r may be . p»asmi<t a phage panicle, or simply . potential genomic insm. 
Once mmsformed into a suitable host, the vector may replicate and funaion independently of the host genome, 
ormay in «me in««.ccs, integme into the genome itself. In the present specification, "plasmid" and -vector- 
are sometimes used interchangeably as the plasmid is the most commonly used form of veaor at present. 
5 However, the invemion is intended to include such other forms of expression vectors which serve equivalem 
functions and which are, or become, known in the art 

n,e -host cclU- used.in the prrsent invention generally are procaryotic or eucaryotic hosts which 
p.efc«bly have been m«,ipul3tedby the methods disclosed in EPO Publication No.0I30756or0251446 or 
Patent No 537 1.008 to render them incapable of secreting enrymuicallyaaivecndopiotease. A preferred host 
10 cell for expressing subtilisin is the Bacillus strain BG2036 which is deficient in enzymatically active neutral 
protease «td alkaline protease (subtilisin). The construction of strain BG2036 is described in detail in EPO 
Publication No.0130756andfuitherdescribcdbyY«,g.M.Y,.r«/.(l9M)J.Baaeriol. 160:15-21. Suchhost 
cells ar. distinguishable from those disclosed in PCT Publication No. 03949 wherein enzymatically mact.ve 
mutants of inth^cellular proteases in £ coU are disclosed. Other host cells for expressing subtilisin include 
15 BacUlus subtUis var. 11 68 (EPO Publication No. 0130756). 

Host cells are transfomied ornansfected with vectors constructed using recombmanl DNA techniques. 
Such tr^tsforined host cellsare capable of either replicatingvecton encoding t^^ 
the desired subtilisin mutant bthecase of vector which encode the pre or preproforrnofthesubri^^^ 

such mutants, when expressed, are typically secreted from the host cell imo the host cell medium. 
20 -Open^ly linked" when describing the relationship between two DNA regions simply means thai they a« 

functionally reUied to each other. For example, a presequence b operably linked to a peptide if it funaions as 
« signal sequence, participating in the secretion of the m«u,e form of ti>e protein most probably mvolvmg 
cleavage of the signal sequence. A promoter b operably linked to a coding sequence if it controls the 
transcription of the sequence; a ribosome binding site is operably linked to a coding sequence if it is positioned 

25 so as 10 permit translation, 

mgenes encoding then«n«lly-occurrinBprecursor«ibtilisinm.y be obuiined in ««^^ 

n,ethods described in U.S. P«ent No. 4.760.025 or EPO Publication No. 0130756. As c«, be seen from the 
examples disclosed therein, the methods generally comprise synthesizing labeled probes having putative 
sequences encoding regions of the hydrolase of intent, preparing genomic libraries from organisms expressmg 
30 the hydrolase. «id screening .he libraries for the gene of interest by hybridization to the probes. Positively 

hybridizing clones are then mapped and sequenced. 

■IheclonedsubUlisinisthenusediotransformahostcellinordertoexpressthesubtilisin. TTte subtilisin 

gene isthenligatedintoahighcopynumberplasmid-mplasmidreplicates in hosts in thesense*^^ 

the well-known elements necessary for plasmid nrplicarion: a promoter operably linked to the gene in question 
35 (wbichmaybesuppliedasthegene-sownhomolo6ouspromotorifitisrecognized./.e..transcnbed,bytheho«^^ 

. t^iscription termination and polyvlenylation region (necessary for stability of the mRNA transcnbed by the 
host from the hydrolase gene in certain eucaryotic host cells) which is exogenous or is supplied by the 
endogenous terminator region of the subtilisin gene and. desirably, a selection gene such as an antibiotic 
resistance gene that enables continuous cultural maintenance of plasmid-infected host celU by grou^ m 
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aniibiotic-contaming media. High copy number plasmids also contain an origin of replication for the host 
ihereby enabling large numbere of plasmids lo be generated in the cytoplasm without chromosomal limitations. 
However, it is within the scope herein to integrate multiple copies of the subtilisin gene into host genome. This 
is faciliuted by procaryotic and eucaryotic organisms which are panicularly susceptible to homologous 
5 recombination. 

Once the subtilisin gene has been cloned, a number of modifications are undertaken to enhance the use 
of the gene beyond synthesi&of the naturally-occurring precursor subtilisin. Such modifications include the 
production of recombinant subtilisin as disclosed in U.S. Patent No. 5 J7l .008 or EPO Publication No. 0130756 
and the production of subtilisin mutants described herein. 

1 0 Mutant design end preparation. 

K Subtilisin Variants Capable of Cleaving Substrates Having Dibasic Residues. 

For the preparation of subtilisin variants capable of cleaving substrates containing dibasic residues, the 
following analysis was undertaken. 

A cumber of structures have been solved of subtilisin with a variety of inhibitors and transition state 

15 analogs bound (Wright, C. S., Alden. R. A. and Kraut, I. (1969) Ntmtre, 22l;235-242; McPhalen. C.A. and 
Janes, N.G. (1988) Biochemistry, 27:6582-6598; Bode, W., Papamokos, E„ Musil, D., Seemueller, U. and Fritz, 
M. (1986) EMBOl. 5:813-818; and Boa, R., Ultsch. M,. Kossiakoff, A., Graycar, T., Katz, B. and Power. S. 
(1988) J. Biol Chem.. 263:7895-7906). One of these structures. Figure I, was used to locate residues that are 
in close proximity to side chains at the PI and P2 positions from the substrate. Previous work had shown that 

20 replacement residues at positions 156 and 166 in the SI binding site with various charged residues lead to 
improved specificity for complementary charged substrates (Wells, J. A., Powers. D. B., Bort, R. R.. Graycar, 
T. P, and Estell, D, A. (1987) Proc. Natl. Acad. Sci. USA. 84:12)9-1223). Ahhough longer range electrostatic 
effects of substrate specifjchy have been noted (Russell, A. J. and Fersht, A. R. (1987) Nature , 328:496-500) 
these were generally much smaller than local ones. Therefore, it seemed reasonable that local differences in 

2 5 charge between subtilisin BPN' and the eukaryotic enzymes may account for the differences in specificity. 

A detailed sequence alignment of 35 different subtilisin-like enzymes (Siezen, R. J„ de Vos. W. M., 
Leunissen, A. M., and Dijkstra, B. W. (1991) Prot. Eng., 4:719-737) allowed us to identify differences between 
subtilisin BPN' and the eukaryotic processing enzymes. KEX2, fiirin and PC2. Within the SI binding pocket 
there are a number of charged residues that appear in the prthhoimone processing enzymes and not in subtilisin 

30 BPN* (Table 1 A). 
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TABLE 1 A 
SJ subsUe 



' numbering according to subiilisin BPN* sequence 





125-131* 


151-157 


163-168 


Subtilisin BPN* 


SLGGPSG 


A AAGNEG 
/cpo ID NO* 4) 


ST- VG YP 
(SEO ID NO: 5) 


Kex2 


SWGPADD 
fSEO ID NO: 6) 


FASGNGG 
fSEOIDNO: 7) 


CNYDG YT 
f SEO ID NO: 8) 


Furin 


S WGPEDD 

fSEO ID NO: 9). 


WASGNGG 
fSEO ID NO: 10) 


CNCDG YT 
fSEp ID NO: m 


pa 


SWGPADD 

fSEO ID NO: 6> 


WASGDGG 
(SEO ID NO: 12> 


CNCDG YA 

fSEO ID NO: 13) 



For example, the cukaiyotic enzymes have two conserved Asp residues ai 130 and 13 1 as well as an Asp at 165 
that is preceded by inscmon of aTyr or Cys. However, in the region from 151-157. subtilisin BPK contains a 
Glu and the eukaryotes a conserved Gly. 

In the S2 binding site there were two notable differences in sequence (Table IB). 



15 



20 



TABLE IB 
S2subsite 





30-35 


60^ 


Subtilisin BPN* 


VIDSGI 
(SEO ID NO: 14) 


DNNSH 
fSEO ID NO: 15) 


KEX2 


I VDDGL 
(SEO ID NO: 16) 


SDDYH 
(SEO ID NO: 17) 


Furin 


ILDDGI 
(SEO ID NO: 18) 


NDNRH 
(SEO ID NO: 19) 


PC2 


IMDDCI 

(SEO ID NO: 20) 


WFNSH 
(SEO ID NO: 21) 



Subtilisin BPB' contains a Ser at position 33 whereas the pro-hormone processing enzymes contain Asp. There 
is not as clear a consensus in the region of 60-64. but one notable difTercnce is at position 62. This side chain 
which points directly at the P2 side chain (Figure 1) is Asn in subtilisin BPN', furin and PC2 but Asp in KEX2. 
Thus, not all substitutions were clearly predictive of the specificity differences. 

A variety of mutants were produced to probe and engineer the specificity of subtilisin BPN' using 
oligonucleotides described in Table 2. 
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TABLE 2 
OHgoRucleoilda used for sUt'-directed 
mutagenesis art subtUlsin, 



\ 


Mut£int 


Oligonucleotide 


Specificity 
Pocket 


Activit)' 
Expressed 1 


s 


S33D 


5 - GCGGrrATCGACCA^CGGTATCGATTCT -3' 
fSE0rr>NO:22) 


S2 








5*. GCGGTTATCGACAA° A^C^GTATCGATTCT -3* 
(SEOIDNO: 23) 


S2 






S33E 


5 - GCGGTTATCGACG" A°A°GGTATCGATTCT -3* 


32 






N62D 


5'- CCAAGACAACG'» ACTCTCACGGAA -3* 
(SEOIDNO:25) 


S2 






N62S 


5'- CCAAGACAACACCTCTCACGGAA .3" 
(SEQ ID NO: 26) 


S2 




10 


N62K 


5'- CCAAGACAACAAA'^TCTCACGGAA -3* 
rSEO ID NO: 27) 


S2 


•6- 




G166D 


5'-CACTTCCGGCAGCTCG'^rC°G''ACAGTGGA''C''T 

ACCCTGGC.AAATA-3* 

(SEQ ID NO: 28) (Inserts Sal I site) 


SI 






GI66E 


5'-CACTTCCGGCAGCTCG<'rC''G0ACAGTGGA°GT 

ACCCTGGCAAATA-3' 

(SEO ID NO: 29) (Insem Sal I she) 


SI 






G12SP/P129A 


3'-rrAACATGAGCCTCGGCC°C''AG«CTA°G''C'»GGT 

TCTGCTGCnTA-3' 

(SEO ID NO: 30) Onsetts Nhe I she) 


SI 






G128P/PI29A/ 


5'-TTAACATGAGCCTCGGCC°C°C°G°CGG°A''TGA'' 


SI 




15 


S130D/G131D 


rrCTGCTGCTTTAAAO' 

(SEO ID NO: 3 1) (Inserts Sac 11 site) 








T164N/V165D 


5'-CGGCAGCTCAAGCA°A°C*'G'*A°rGGCTAT°CCT 
GGCAAATACCOTCTGTCA -3* 
(SEO ID NO: 32) (Inserts BsaBI site) 


SI 
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20 



TI64Y/V165D 


5'.CGGCAGCTCAAGCA'A«C'G-A*T*GGCTAT*CCT 
GGCAAATACCCTTCTGTCA -3* 


SI 




TI64N- 
Y(tnscit)- 


5*-ACTTCCGGCAGCTCT*T«C*G*AA»C*T'A'C U A 

C*GGGTACCCTGGCAAATA-3' 

f SEO ID NO: 34) (Inserts BstBl site) 


SI 




N62D/G166D 


See indtviduAl mutations 


SI/S2 




N62D/G166E 


See individual mutations 


S1/S2 





- ASicnsM uiu*«-»*»- o , , - 

After producing the munrnt pUanids they were nnsfoimed into « protease deficient strain of £. subiUts 
CBG2036) that Ucks an endogenous gene for secretion of subtilisin. These were then tested for protease activity 
on skim milk plates. 

The first set of mutants tested were ones where segments of the SI binding site were replaced with 
sequences &om KEX2. None of these segment irplacements produced detectable activity on skim milk plates 
even though variants of subtilisin whose catalytic efficiencies are reduced by as much as 1000-fold do produce 
de^ctable halos (Wells. JA. Cunningham. B.C.. Graycar. T.P. and Estell. DA. (1986) Philos. Trans. R. Soc 
Lond A . 3 17:4 15-423). We went on to produce single residue 

substimtions thai should have less impact on the stability. These mutants at poshions 166 in the S 1 site, and 33 
and 62 in the S2 site, were chosen based on the modeling and sequence considerations described above. 
Fortunately .11 single mutants as well, as combination mmants produced activity on skim milk plates and could 
be purified to homogeneity. 

Kbietk anafysb of variant suhtUbba. 
To probe the effects of the G166E «,d G166D on specificity at the PI position we used substnics having 
the form suc-AAPX-pna (SEQ ID NO: 69) where X was either Lys (SEQ ID NO. 58). Arg (SEQ ID NO. 59) . 
Phe (SEQ ID NO. 56). Met (SEQ ID NO. 60) or Gin (SEQ ID NO. 61 ). Tlie values were detemiined 

from iniUal rate measurements and results reponed in Figure 2. Whereas the wild-type enzyme preferred 
25 Phe>Met>Lys>Arg>Gto. the GI66E preferred Lys-Phe>Arg-Met>Gln. «id G166D preferred 
Lys>Phe-Arg-Met>Gln. TTius. both the acidic substinitions at position 166 caused a shift in preference for basic 
residues at the PI site, as previously rrponed (Wells, J. A.. Powers. D. B.. Boti. R. R.. Graycar. T. P.and Estell. 
D. A. (1987a). Proc. Natl. Acad. Sci. USA 84:1219-1223). 

The effects of single and double substitutions in the S2 binding site were analyzed wWi substrates having 
30 the form. suc-Ala-Ala-Xaa-Phe-pna (SEQ ID NO. 70) and are shown in Figure 3. A, the P2 position the wild- 
type enzyme prefened Ala>Pro>Lys>Arg>Asp. In contrast, the S33D preferred Ala>Lys-Arg-Pro>Asp and 
,heN62DpreferredLys>Ala>ArE>Pro>Asp. Although the effects were more dramatic for the N62D mutant, 
the S33D variant also showed significant improvemem toward basic P2 residues and correspondmg reduction 
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m hydrulysis of the Ala and Asp F2 subsir&ies. We then analyzed the double mutant, but found it exhibited the 
cmlyiic efficiency of the worse of the two single mutants for each of the substrates tested. 

Xkspite the less than additive effects seen for the two charged substitutions in the S2 site, we decided to 
combine the best S2 site variant (K62D) with either of the acidic substitutions in the SI site. The two double 
5 snutants, N62D/G 1 66E and N62D/G 1 66D, were analyzed with substrates having the fonn. sue- AAXX-pna (SEQ 
ID NO. 7J) where XX was either KK (SEQ ID NO, 66), KR(SEQ ID NO,67X KF (SEQ ID NO. 62). PK (SEQ 
ID NO, 58). PF (SEQ ID HO. 56) or AF (SEQ. ID NO. 63) (Figure 4). The wild-type preference was 
AF>PF-KF>iaC-PK>KR, whereas the double mutants had the preference ICK>KR>KF>PK-AF>PF. Thus for 
the double mutants there was a dramatic improvement toward cleavage of dibasic substrates and away from 

10 cleaving the hydrophobic substTBies. * 

The greater than additive effect (or synergy) of these mutants can be seen from ratios of the catalytic 
efficiencies for the single and multiple mutants. For example, the G t66£ variant cannot distinguish lys firom 
T^e at the PI position. Yet the N62D/G166E variant cleaves the Lys-Lys substrate about 8 times fiaster than the 
Lys-Phe subsoaie. Similarly the G166D cleaves the Lys PI substrate about 3 limes faster than'ihe Phe PI 

15 substrate, but the N62D/G166D double mutant cleaves a Lys-Lys substrate IS times faster than a Lys- Phe 
substraie. Thus, as opposed to the reduction in specificity seen for the double mutant in the S2 site, the SI-S2 
double mutants enhance specificity for basic residues. It is possible that these two sites bind the dibasic 
substrates in a cooperative manner analogous to a chelate effect. 

Therefore, according to the present invention, subtilisin mutants having a preference for dibasic residues 

20 are preferred. According to this aspea of the present in vention'substitution of amino acids corresponding to 
amino acids N62 and G 166 of subtilisin BPN" produced from Bacillus amyloUquefaciens are prepared. In 
particular, amino acids 62 and 166. or their equivalents, in the precursor subtilisin are substituted with amino 
acid residues Asp or Glu. Preferred subtilisin variants according to this aspect of the invention include 
N62D/G166D.N62E/G166E,N62E/G166D, and N62D/G 1 66E variants of subtilisin BPN' and their equivalenu. 

25 B. Subtilisin Variants Capable of Cleaving Substrates Having Tribasic Residues 

For the preparation of subtilisin variants specific for substrates containing a third basic residue at substrate 
position P4 we used the crystal stnicture of subtilisin BPN' coraplexed with AU.AlB-Pro-Phe-Boronaie(SEQ ID 
NO: 56) (Figure 7) in combination whh sequence alignments of subtilisin BPN\ KEX2. Furin, PC2. and P 
(Table 3) in designing basic specificity into the SI and S2 and S4 subsites. The two subtilisin BPH' residues 
3 0 that most prominently display their side chains into the S4 pocket are Y 1 04 and 11 07 (Figure 7). 

Sequence alignments of subtilisin BPN' and the mammalian prohormone-processtng proteases (Siezen, 
J., de Vos, W. M., Leunissen. A. M„ and Dijkstra, B. W. (1991) Proi Eng. 4:71 9-737) aablc 3) reveal that 
position 104 is conserved as Asp, and 107 as Glu in the prohormone converting (Arg-P4 specific) enzymes. 
Therefore these two muutions were introduced either individually or in combination into the dibasic-specific 
35 N62D/G166D subtilisin BPN' background (Table 4). 
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Table 3 Sequence aligmnents for the S4 siie of subtilisins 

S4 Site 
100-110 



PCTAJS96rt)2861 



Subtilisin 

KEX3 

Purin 

PC2 

P 



GSGQYSWIING 
GDITTEDEAAS 
GEVTDAVEARS 
PPMTDIIEASS 
CrVTDAIEASS 



(SEQ ID NO: 77) 

(SEQ ID NO: 78) 

(SEQ ID NO: 79) 

(SEQ ID NO: 80) 

(SEQ ID NO: 81) 



10 Tabk 4 describes oligonucleotides tised for she-dirtaed mutagenesis, protein regions affected by the 

mmations, and relative expression of protein for N62D/G 1 66D subtilisin BPK variants. Bold type indicates 
base changes from the pSS5 (N62D/G 1 66D) template. For ''Protein Expressed,*' indicates a high level 
of expression of mature enzyme in crude culture medium, and indicates no enzyme detectable. 
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OlloegmelnPtAAo 



Protoin 



Ps-o- 
toio 



prnn 



YIO^D 
I1D7E 

Y104D/I107E 

A(-4)R/ 
20 A(-2)K/ 
y(-l)R 

Y104D/ 
A{-4)R/ 

25 y(-l)R 

I107E/ 
A(-4)R/ 
A(-2)K/ 
Y(-1)R 

30 Y104D/I107E/ 
A(-4)R/ 
A(.2)K/ 
Y(-1)R 



5*. GGTTCCGGCCAA.GATAGCTGGATCATT -3' 
(SEQ ID NO: 62) 

5'- CCAATACAGCTGGGAAATTAACGGAATCG -3 
(SEQ ID NO: 83) 

5'- GGTTCCGGCCAAGATAGCTGGGAAATTAACG 
GAATCGA (SEQ ID NO: B4) 

5'- AAGAAGATCACCTAAGACAT2U\.GCGCGCGC 
AGTCCGTGC -3' (SEQ ID NO: 85) 

See individual mutations. 



See individual mutations 



See individual mutations 



S4 

poclcet 
S4 

poclcet 
S4 

pocket 

Proces- 
sing 
site 

S4 

pocket * 
Proces- 
sing 
site 

S4 

pocket * 
Proces- 
sing 
site 

S4 

pocket ♦ 
Proces- 
sing 
site 
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Initial attempts to express the triple mutants in Bacillus were unsuccessful, as mdtcaied by SDS-PAGE 
of crude supematants. We reasoned that the source of the expression problem could lie in the fact that 
correct folding and maturation of sublilisin requires auiolyiic cleavage of its propeptide (Power. S.D., 
Adams, R. and Wells, J. A. (1986) Proc. Natl Acad. Sci. USA 83, 3096-3100).. The processing site in 
5 the wild-type enzyme has a sequence that is optimized for the natural substrate preference. AHAYi A (I 
denotes the site of cleavage). Although the N62D/C I66D subttlisin can still autotyze itself with the wild- 
type processing site, the additional S4 pocket mutations could reduce the cleavage to the point where 
expression was lowered to a minute level 

To lest whether the mutants were expressed poorly due to m inability to autolytically process itself, 

a 0 mutations in the processing site were simuhaneousty incorporated to accommodate the changes in substrate 
specificity. Thus the sequence from positions -4 to -1 was changed from AHAY to RHKR in combination 
with the S4 site mutations. For N62DA'104D/G166D, high levels of expression could then be achieved 
providing an indication that the additional Y104D mutation induced an especially strong preference for ?A 
Arg over Ala. Variants containing the I107E mutation, however, could not be expressed even with the 

15 change in the processing site. 

Kiaetic Qoalysls of varinnt oubtlUsins 
T^ie manire N62D/Y104D/G166D variant was purified and analyzed for its ability to hydrolyze several 
tetrapeptide-pNA substrates. Table S displays the results along with data for the N62D/G 166D mutant and 
wild-type subttlisin. 
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The mbasic substrates succmy)*RAKR-pNA (SEQ ID NO: 86) and succinyl-KAKR-pNA (SEQ ID 
NO: S7) were hydrolyicd with high caialyiic efTictency (k^^/i^) by the triple muiani. at a level simitar %o 
wild-type subiilisin versus one of its best substrates, succinyl-AAPF-pNA (SEQ ID NO: 56). In contrast 
the dibasic substraie succinyJ-AAKR-pNA (SEQ ID NO: 67) was hydrolyzed 60-foid less efficiently, mostly 
5 due to dimunition ofk^. This indicates a dranuxic specificity change from the Wild-type preference at P4, 
at which hydrophobic residues are strongly favored over charged side chains (Grsn, H, and Breddam, K. 
(1992) BiQckemistry%\, 8967:8971). In fact N62D/G166D subtilisin appears to cleave at an altemate site 
in the $uccinyl-RAKR-pN A (SEQ ID NO: S6) substrate, indicating thai Arg was not accepted in its wild-type 
S4 site. 

10 The large magnitude of the combined specificity changes in the N62D^104D/G 166D variant is 

evidenced by its strong discrimination against substrates that are preferred by the wild-type enzyme. For 
example, succinyl-AAPF-pNA (SEQ ID NO: 56) is hydroiyzed 6 x 10^-fold less efficiently than succinyl- 
RAKR-pNA (SEQ ID NO: 86). Clearly, the S4 site mutation greatly improves upon the discriminatory 
power of the parent dibasic-specific N62D/G166D subtilisin, where the ratio of catalytic efficiency for 

15 succinyJ-AAKR-pNA versus succinyNAAPF*pNA is 1 .9 x 10^. The improvement in discrimination (3 10» 
fold) is also higher than would be predicted ton the data for hydrolysis of succinyl-RAKK-pNA (SEQ ID 
NO: 86) versus succinyl-AAKR*pNA (SEQ ID NO: 67) by the triple mutant (a 60-fold effect). 

Therefore in order to produce subtilisin variants capable of cleaving substrates containing basic 
residues at positions P4, P2, and PI, additional site specific substitutions are made in the dibasic specific 

20 subtilisin variants. According to this aspect of the mvention, subsdnition of the amino acid corresponding 
to Y1(M of subtilisin BPK produced by Bacillus Amyhliquefackns, i,c., amino acid 104 of subtilisin BPN' 
or its equivalent, produces a variant having substantially ahered substrate specificity. In a preferred 
embodiment of the present inventicm amino acids corresponding to N62, Y 1 04, and G 166 of subtilisin BPN* 
are substituted with acidic amino acids, preferably Asp and Glu and most preferably Asp. Subtilisin BPN* 

25 variants N62D/Y104D/G166D, N62DA'104E/G166D, N62EA'104D/G166E, N62E/Y104E/G166E. 
N62E/yi04D/Gl66D, N62EA'104E/GI66D. N62DA'104E/G166E, ond N62D/Y104D/GI66E, and there 
equivalenu are preferred. Most preferred among this group of subtilisin variants are the 
N62D/Y104D/G166D subtilisin BPN' variants and their equivalents. 

MmageD£5is and Synthetic Technimtes 
3 0 Various techniques are available which :(iuy be employed to produce mutant DN A. which can encode 

the subtilisin variants of the present invention. For instance, it is possible to derive mutant DNA based on 
naturally occurring DNA sequences that encode for changes in an amino acid sequence of the resultant 
protein relative to a precursor subtilisin. These mutant DNA can be used to obtain the variants of the present 
invention. 

35 According to the invention, specific residues of B, amyhliquefaciens subtilisin are identified for 

substitution. These amino acid residue position numbers refer to those assigned to the B. atnyloliquefaciens 
subtilisin sequence (sec the mature sequence in Fig. 1. of U.S. Patent No. 4,760*025). The invention, 
however, is not limned to the mutation of this particular subtilisin but extends to precursor subtilisins 
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containing amino acid residues which are equivalent, as defined herein, to the panicular identified residues 
mB, ffmyiolu^efaciens subtilisin. Equivalenrsmino acids can be found in. for instance, subtilisn Carlesberg 
^tmBacitiuslicheniformis,Bo6c€ltl.,(l9S6) EMBO 1, 5:813-818; thcrmitase from Thermoactinomyces 
vulgaris, Gros ct al., (!989)J. Mol. Biol. 210:347-367; and proteinase K from Tritirachium album. Betzel. 
5 et aL, (1988) Acio Crysollogr., B» 44: 163-1 72) as described by Siezcn et aU (1991) Prof. Eng.. 4: 719.737). 

By way of illustrBtion, with expression vecton encoding the precursor subtilisin in hand (see for 
example U.S. Patent No 4,760,025) site specific mutagenesis (FCunkel et a1.. (1991) Methods Enzymol. 
204:125-139; Carter, P., ct ti., (1986) NucL Acids. Res. 13:4331; Zoller, M. J. et aL (1982) Nucl. Acids 
Res. 10:6487), cassette mutagenesis (Wells. J. A., ct aL, (1985) Gene 34:315), restriction selection 

10 mutagenesis (Wells, J. A., ct tl., (1986) Philos. Trans, K, Soc. London Scr A 317, 415) or other known 
techniques may be performed on the DNA. The mutant DNA can thee be used in place of the parent DNA 
by insenion into the appropriate expres sion vectors. Growth of host barteria containing the expression 
veoors with the mutant DNA allows the production of variants which can be isolated as described herein. 
Oligonucleotide-mediated mutagenesis is a prc(and method for preparing the variants of the present 

15 invention. This technique is well knowm in the art as described by Adelman et al, (1983) DNA . 2*183. 
Briefly, the native or unaltered DNA of a precitrsor subtilisin, for instance subtilisin BPN*, is altered by 
hybridizing an oligonucleotide encoding the desired mutation to a DNA template, where the template is the 
single-strsnded fonn of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 
precurm. 

2 0 After hybridization, a DNA polymerase is used to synthesize an entire second complementary strand 

of the template that will thus incorporate the oligOTuclcotidc primer, and will code for the selected aheration 
in the DNA. 

(jenerally, oligonucleotides of it least 25 nucleotides in length are used. An optimal oligonucleotide 
will have 12 to 15 nucleotides that arc completely complementary to the template on either side of the 
25 nucleotide($) coding for the mutation. This ensures that the oligonucleotide will hybridize properly to the 
single-stranded DNA template molecule. T^e oligonucleotides are readily synthesized using techniques 
known in the art such as those described by Crta ci al. (1987) Proc. Natl. Acad. Sci. USA. 75:5765. 
Exemplary oligonucleotide sequences for introducing amino acid changes into precur^r subtilisin BPN* are 
provided in Tables 2 and 4. 

30 Single-stranded DNA template may also be generated by denaturing double-stranded plasmid (or 

other) DNA using standard techniques. 

For alteration of the native DNA sequence (to generate amino acid sequence variants, for example). 

the oligonucleotide is hybridized to the single-stranded template under suitable hybridization conditions. 

A DNA polymerizing cn^roe, usually the Klenow fragment of DNA polymerase L is then added to 
35 synthesize the complementary strand of the template using the oligonucleotide as a primer for synthesis. A 

heteroduplex molecule is thtis fomied such that one strand of DNA encodes the variant form of the subtilisin. 

and the other strand (the original template) encodes the native, unaltered sequence of the precursor subtilisin. 

This heteroduplex molecule is then transformed into a suitable host cell. After the cells are grown, they are 

plated onto agarose plates and screened using the oligonucleotide primer radiolabeled with 32-phosphate to 
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identify the bacterial colonics that ct)ntam the mutated DNA. The mutated region is then removed and 
placed in an eppropriaie vector for protein production, generally an expression vector of the type typically 
employed for transformation of an appropriate host. 

The method described immediately above may be modified such that a homoduplex molecule is 
S created wherein both strands of the plasmid contain the mutarion($)- The modifications are as follows: The 
single-stranded oligonucleotide is annealed to the single^stranded template as described above. A mixture 
of three deoxyribonucleotides. deoxyriboadenosine (dATP), deoxyriboguanosine (dGTP), and 
deoxyribothymidine (tfTTP), is combined wtA q modified thio-deoxyribocytosine called dCTPKaS) (which 
can be obtained from Amersham Corporation). This mfartune is added to the template-oligonucleotide 

10 complex. Upon addition of DNA polymerase to this mixmre, a strand of DNA identical to the template 
except for the mutated bases is generated. In addition, this new strand of DNA will contain dCTP-(aS) 
instead of dCTP, which serves to protect it from restriction endonuclease digestion. 

After the template strand of the double-stranded heteroduplex is nicked with an appropriate restrittion 
enzyme, ^e template strand can be digested whh gsplll nuclease or another appropriate nuclease past the 

15 region that contains ^e stte(s) to be mutagenized. The reaction is then stopped to leave a molecule that is 
only partially single-stranded, A complete double-stranded DNA homoduplex is then formed using DNA 
polymerase in the presence of all four deoxyribonucleotide triphosphates, ATP, and DHA ligase. This 
homoduplex molecule can then be transformed into a suitable host cell as described above. 

DNA encoding variants with more than one amino acid to be substituted may be generaied in one of 

2 0 several ways. If the amino acids are located close together tn the polypeptide chain, they may be mutated 
simultaneously using one oligonucleotide that codes for all of the desired amino acid substitutions. If« 
however, the amino acids are located some distance from each other (separai£d by more than about ten amino 
acids), it is more difficult to generate a single oligonucleotide that encodes all of the desired changes. 
Instead, one of two alternative methods may be employed. 

25 in the first mediod, a separate oligonucleotide b generated for each amino acid to be substituted. The . 

oligonucleotides are then annealed to the single-stranded template DNA simultaneously, and the second 
strand of DNA that is synthesized from ^ template will encode oil of the desired amino acid substitutions. 

The alternative method involves two or more rounds of mutagenesis to produce the desired mutant. 
Tlje first round is as described for the single mutants: wild-type DNA is used for the template, an 

30 oligonucleotide encoding the furt desired amino acid £ubstitution(s) is annealed to this template, and the 
heteroduplex DNA molecule is then generated. The second round of mutagenesis utilizes the mutated DNA 
produced in the first round of mutagenesis as the template. Thus, this template already contains one or more 
mutations. The oligonucleotide encoding the additional desired amino scid substiaition(s) is then annealed 
to this template, and the resulting strand of DNA now encodes mutations from both the first and second 

35 rounds of mutagenesis. This resultant DNA can be used as a template in o third round of mutagenesis, and 
soon. 
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auMge e/a Fusion Proubu WUi SubtlUsbt Variants 

A fusion prottin is my polypeptide thai contains within it an affinity domain (AD) that usually aids 
in protein purification, a protease cleavage sequence or subaraie linker (SL). which is cleaved by a protease 
rod a protein product of interest (PP). Such fusion proteins are generally expressed by recombinant DNA 
5 technolog)-. genes for fusion proteins are designed so that the SL is between the AD and PP. These 
usually take the form AD-SL-PP such that the domain closest to the N-tetminus is AD and PP U closest to 
the C-teiminus. 

Examples of AD would include. glutalhione-S-transferase which binds to glutathione, protein A (or 
derivatives or ftagmentsfteiroO which binds IgG molecules, poly-hisiidine sequences, particularly (Hi$)6 
10 (SEQ ID NO: 51) that bind metal affmity columns, maltose binding protein that binds maltose, human 
growth hormone thai binds the human growth honnone receptor or any of a variety of other proteins or 
protein domains thai can bind to an immobilized affinity support with an association constant (Ka) of > 1 o' 
M-'. 

The SL can be any sequence which is cleaved by the subtilisin variants of the present invention. In 
IS pirparatioas where the variant N62D/y 104D/G166D or fas equivalent are used the SL can be any sequence, 
prefenbly at least 4 amino acids, in which the P4. P2, and PI residues are basic residues. Therefore a SL 
linker is employed of the general formuU P4.P3-P2-P1 wherein P4. P2, and PI are basic amino acid 
residues. Preferred SU aecoiding to this aspect of the mvention include Lys-Ala-Lys-Aig (SEQ ID NO: 87) 
and Arg-Ala-Lys-Arg (SEQ ID NO: B6). 
20 Likewise, where the N62D/G166D subUlisin variant is contemplated the SL preferably contains di- 

basic residues. Forfte variants capable of cleaving substrates containing dibasic residues the SL should be 
at least four residues and preferably contain a large hydrophobic residue at P4 (such as Uu or Met) and 
dibasic residues at P2 and PI (such as Arg and Lys). A particularly good substrate is Leu-Met-Arg-Lys- 
(SEQ ID NO: 52). but a variety of other sequences may work including Ala-Ser-Arg-Arg (SEQ ID NO: 50) 
25 andevenLeu-Thr-Ala-Arg(SEQIDN0 53). 

It is often useful thai the SL contain a flexible segment on fas N-terminus to better separate it from the 
AD and PP. Such sequences include GlyPro-Gly^ly (SEQ ID NO: 54) but can be as simple as Gly-Gly 
or Pro^jly. Thus, an example of a particularly good SL would have the sequence Gly-Pro-Gly-Gly-Uu- 
Met-AJg-Lys (SEQ ID NO: «8) in the case of subtilisin variants capable of cleaving substrates containing 
30 dibasic amino acids. orGly-Pro-Gly-Gly-Lys-iaa-LyB-Arg (SEQ ID NO: 89). This sequence 

would be inserted between the AD and PP domains. 

The PP can be virtually any protein or peptide of interest but preferably should not have a Pro. He. 
Thr. Val. Asp or Glu as its fust residue (PI*), or Pro or Gly at the second residue (P2-) or Pro at the third 
residue (P3'). Sudi residues are poor substrates for Ae enzyme and may impair the ability of the subtilisins 
35 variant to cleave the SL sequence. 

The conditions for cleaving the fusion protein are best done in aqueous solution, although it should 
be possible to immobilize the enzyme and cleave the soluble fusion protein. It may also be possible to cleave 
the fusion protein as it remains immobilized on a solid support (e.g. bound to the solid support through AD) 
with the soluble subUlisin variant. It is preferable to add the enzyme to the fusion protein so that the enzyme 
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is less than one part in 100 (1:100) by weight. A good buffer is J0-50mM Tris (pH t2) in lOmM NaCl. 
A preferable temperature is about 25*C although the enzyme is active up to 65*C. The extent of cleavage 
can be assayed by applying samples to SDS-PAGE Generally suitable conditions for using the subtilisin 
variants of this invention do not depart substantially from those loiown in the an for the use of other 
5 subtllisins. 

EXAIViPLES 

In the examples below and elsewhere, the following abbreviations are employed: subiiiisin BPN\ 
su\^\\&mfcQm Bacillus amylotiqurfack^^ (SEQ ID HO, 731,N.t.butoxy carbonyl- 

arginine-vaIine-afginine*arginine-7-amido-4-methyl coumarin;$uc-Ala*AIa*Pr^ (SEO ID NO. 

10 S6),N-succinyl-alanine-alanine-proline-phenylalanyI-p-nitroanaUde (SEQ ID KO. 56) ;hGH, human 
growth hormone; hGHbp, extracellular domain of the hGH receptor: PBS, phosphate buffered saline: AP, 
alkaline phosphatase; 

Esomple 1 

Construction snd Puriflcaiion ofSubtlUstn MutSHis, 
as Site-directed mutations were introduced into the subtilisin BPN* gene cloned into the phagemid pSS5 

(Wells. J. A.. Penan, E., Henner. D. J., Estell. D.A, and Chen, E. Y. (1983) NucL Acid: Res. 11:7911- 
7929). Single-stranded uracil-containing pSS5 template was prepared and mutagenesis perfonned using the 
meAod of Kunkel (Kunkel, T. A. . Bebenek, K and McClaiy, J. (1991) Methods Ensymoi 204:125-139). 
For example, the synthetic oligonucleotide N62D, 

20 (5 - CCAAGACAACG^ACTCTCACGGAA -3*) (SEQ ID 

NO. 25) 

in which the asterisk denotes a mismatch to the wild-type sequence, was used to construct the N62D mutant. 
The oligonucleotide was first phosphorylated at the 5' end using T4 polynucleotide kinase according to a 
described procedure (Sambrook, J., Fritsch, E. P., and Maniatis, T. (1989) in ^'Molecular Cloning: A 

2 5 Laboratory Manual,** Second Edition. Cold Spring Harbor, K.Y.)- The phosphorylated oligonucleotide was 
annealed so single-stranded uiacil-containing pSS5 template, the complementary PNA strand was filled in 
with deoxynudeotides using T7 polynucleotide kinase, and the resulting nicks ligated using T4 DN A Hgase 
according to a previously described procedure (Sambrook, J., Fritsch, £. F., end Maniatis, T. (1989.) in 
-Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.). Hetcroduplex 

30 DNA was transformed into the E. coli host JM 101 (Yanish-Perron, C, Viera, K and Messing. J. (1985) Gene 
33: 105-199), and putative mutants were confirmed by preparation and dideoxy nucleotide sequencing of 
single stranded DNA (Sanger, F., Nicklen. S, and Coulson, A, R. (1977)Proc. Natl Acad. Sci USA 
74:5463-5467) according to the SEQUENASE^ protocol (USB Biochemicals), Mutant single -stranded 
DNA was then retransformed into JM 10 1 celb and double stranded DNA prepared according to a previously 
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described procedure (Sanbrook. J..-Friuch. E. F.. and ManiatU. T. (1989) ii. 'MoJecular Oomng: A 
Uboi«o,yManu.l.-SecondEdhion.CoJdSprinBHart»r.NX^^ For otf,ermu«rio« also mjuiring the use 
of one primer, the oligonucleotides used are listed in Table 2. For several of these oligonucleotides, 
additional silent mutations emplacing new restrinion sites were simultaneously introduced to provide an 
alternative verification of muugenesis. 

To construct the double mutants N62D/G166D. and N62D/G166E. pSSS DNA containing the N62D 
mutation was produced in single-strmided urBcil-containing form using the Kunkel procedure (Kunkel. T. 
A. Bebenek.KondMcClary.J.(1991)A/«Aodr£nry^o/.2W. 125-139). "H.is mutant DNA was used as 
template for the further introduction of the G166D or G166E mutations, using the appropriate 
oligonucleotide primen (see sequences in Table 2). following the procedures described above. 

To construct the triple mutantt. such as N62D/yi04D/Gl66D. pSS5 DNA conuining the 
N62D/G166D mutation or other appropriate double mutation, was produced in single-stnnded uracil- 
containing form using the Kunkel procedure (Kunkel. T. A. . Bebenek. K and McClary. J. (1991) Methods 
Enzymol. 204.125-139). TTiismutantDNA was used astemplateforthefurtherintroduaionoftheYlCMD 
15 mumioDS. using the appropriate oligonucleotide primers (see sequences in Table 4). following the 

procedures described above. 

For expr«$ion of the subtilisin BPK mutants, double str^tded mutant DNA was tnmsfotmed into a 
proiease-deficient strain (BG2036) of BocUlus SublUis (Yang. M. Y.. Fermi. E. and Henner. D. J. 
(1984yo«rfla/ of Bacteriology 160:15-21) according to a previous method (Anagnostopolouus. C. and 
Spizizen. J. (\96\) Joumal of Bacteriology 81:741-746) in which transformation muwres were plated out 
on LB plus skim milk plates containing 123 ^g/mL chloramphenicol. The dear halos indicative of skim 
milk digestion surrounding transformed colonies were noted to roughly estimate secreted protease activity. 

Tl>e transformed BG2036 strains were cuhured by inoculating 5 mL of 2xYT media (Miller. J. H.. 
(1972) in -Experiments in Molecular Genetics.- Cold Spring Harbor. N.Y.) conuining 12.5 ngftnL 
chloramphenicol and 2 mM CaClj at 37 "C for 18-20 h. followed by 1:100 dilution in the same medium and 
growth in shake flasks m 37 -C for 18-22 h with vigorous aeration. Tlte cells were harvested by 
centrifugation (6000g. 15 min, 4«C). and to the supernatant 20mM (final) CaCI, and one volume of ethanol 
(-20»C) were added. Afler 30 min ot 4«C. the solution was centrifuged (12.000g. 15 min. 4»C). and one 
volume of ethanol (-20»C) added to the supcntaiant After 2 h at .20«C. the solution was centrifuged 
(12 000g. 15 min. 4«C) ond the pellet resuspended in and dialyzed against MC (25 mM 2.(N. 
Morpholino)ethanesulfonic acid (MES).5mMCoCbaipH5.5)ovemightat4-C. The dialys«e was p^^ 

through a 0.22 urn syringe filter and loaded onto a mono-S cation exchange column tun by an FPLC system 
(Pharmactt Biotechnology). TT.e column was washed with 20 volumes of MC and mutant subtilisin eluted 
overalineargradientofzerotoO-lSMNaClinMCallatanowniteof 1 mL/min. Peak fractions were 
recovered and the subtilisin mutant quantitaied by measuring the absorbance at 280 nm (Ejeo 0. 1% = 1 .1 7) 
(Matsubara, H.; Kasper. C B.; Brown, D. M.; and Smith. E. L. (1965V. Biol Chem.. 240:1 125-1 130.) 
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Esoraple % 

Kineilc Ckeractertmions 

Subtiiisins were assayed by measuring the initial rates of hydrolysis cfp-niiroanUide teirapepude 
substrates in, 0,4 mL 20 mM Tris-Cl pH 8^. 4 % (v/v) dimethyl sulfoxide at (ZS ± 0^)*'C as described 
5 previously (Estell, D. A., Graycar* T. P.» Miller* J. V., Powers, D. B., Bumier, J. P., Ng, P. G. and Wells. 
J. A, (l9%6)Scien£€ 233:659-663). Enzyme concentrations (E)o were detennined spectrophotometrically 
uiing BjjjooaO.^'/e ° l-H (Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith, E. L. {1965)J. Biol. 
Chem^40:l 125-1 130.), and were typically 5-50 nM in reaaions. Initial rates were detennined for nine to 
twelve different substrate concentrations over the range of 0,001-2.0 mM, Plots of initial rates (v) versus 
1 0 substrate concentration (S) were fitted to the Michaelis-Menion equation, 

WE)o{(S)) 
Km-^(S) 

to detennine the kinetic constants k^j, and Km (Fersht, A. in ''Enzyme Structure and Mechanism'', Second 
15 edition, Freeman and Co., N.Y.) using the program Kaleidagraph (Synergy Software, Reading, PA). 

Esomple 3 

Syhstmie Phsge 

Substrate phage selections were perfomied as described by Maahews and Wells (Maahews, D. J. and 
Wells, J. A. ( 1 992)Science 260: 1 1! 3- 1 U 7), with minor modifications. Phage sorting was carried out using 

20 a library m which the linker sequence between the gene III coat protein and a tight-binding variant of hGH 
was GPGGX5GGPG (SEQ ID NO. 52). The libraiy contained 2x10* independent transformanu. Phage 
particles were prepared by infecting 1 mL of log phase 27C7 (F/tet^/Ompt'^degP') Escherichia co/i with 
approximately 10*^ library phage for 1 h at 37*C. followed by 18-24 h of growth in 25 mL 2YT medium 
containing lo'^ MI3K07 helper phage and 50 fxg/mL carbenicillin at 37*C. Wells of a 96-weII Nunc 

25 Maxisorb microtiier plaie were coated with 2 pg/mL of hGHbp in 50 mM NaHCOj at pH 9,6 overnight at 
4*»C and blocked with PBS (10 raM sodium phosphate at pH 7.4 nd 150 mM NaCI) containing 2.5% (w/v) 
akim miOc for 1 h at room temperature. Between lO" ond lo" phage in 0.1 mL 10 raM tris-Cl (pH 7.6), I 
mM EDTA, and 100 mM NaCl were incubated in the wells at room temperature for 2 h with gentle agitstion. 
The plate was washed first with 20 rinses of PBS plus 0.05% Tween 20 and then mice with 20 mM tris-Cl 

30 at pH 8:2. The N62D/G166D subtilisin was added in 0.1 mL of 20 mM tri$-Cl at pH 8.2 and protease 
sensitive phage were eluted after a variable reaaion time. The concentration of protease and incubation 
times for elution of sensitive phage were decreased gradually over the course of sorting procedure to incrase 
selectivity, with protease concentrations of 0.2 nM (rounds 1-3) and 0. 1 nM (rounds 4-9), and reaction times 
of 5 min (rounds 1-6), 2.5 min (round 7), 40 s (round 8) and 20 s (round 9). Control wells in which no 

2 5 protease was added were also included in each round. For the resistant phage pool the incubation time with 
protease remained constant at 5 min. The wells were then washed ten times with PBS plus 0.05% Tween 
20 and resistant phage eluted by treatment with 0,1 mL of 02 M glycine at pH 2.0 in PBS plus 0.05% Tween 
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20 for I min « room tcmpcrarurc. • Prouasc sensitive «d resist«,t phage pools were titered ^6 used to 
infect logph«e 27C7 ceils for 1 h «37-C. followed by centrifugation a, 4000 rpm. removal of supernatant. 
„d resuspension in 1 mL 2YT medium. TKe infected celb were then grown 18-24 h in the presence of 
helper phage as described above and the process repeated 9 times. Scleaed substrates were introduced .mo 
5 APfi«ionprotcins«tdass.ycdforr.l«iver«esofcle.v.geasdesaibedbyMatthews«tdW^ 

D J Goodm^t. L. J, Goiman. C. M. «.d Wells. J. A. (I994)Protein Science 3:1 197-1205 and Matthews. 
D.3^andWelU. J. A.(l993>&:ienc. 260:1113-1117). except that the cleavage reactions were perf^^^ 

20inMTri*-ClatpH8.2. 

Example 4 

10 Substrate phage sekalon and cleavage of a fusion protein 

SubtUisin has the capability to bind substrates from the P4 to ?V positions (McPhalen. C. A. and 
James, N. G. (1988) B,ocfc.m«rrK 27:6582-6598 and Bode. W.. Papamokos. E.. Musil. D.. Seemuellcr. U. 
andFriCL M. (1986)£AfBO/ 5:813-818). Given this extensive binding site and the apparent cooperative 
namrt in the way the ,ubstr«e can bind the enzyme we wished lo explore more broadly the substnue 
15 preferences for the enryme. To do thb we tttilired the substrate phage selection (Matthews. D. J.. Goodman. 
L. 3 Gorman. C. M.. and Wells. J. A. (1994)Protein Science 3:1 197-1205 and Matthews. D. J. and Wells. 
J A (1993>Science260:lll3-1117)desaibedinExample3. In this method a five-residue substrate Imker 
that was flanked by di-glycine residues is inserted between «, affmity domain (in this case a high affinity 
variant of hGH) and the carboxy-terminal domain of gene lU. a minor coat protein displayed on the surface 
2 0 of the filamentous phage. MI3. nte five residue substr«e linker is fully randomized to generate a libr«y of 
20'difrercmpnxein sequence v«iants.TT«e«tdispUyed on the phage particles whic^ 
to ftehGHbp.mprotease of interest was «lded«>d if itcleaved the phage panicle at the substrate imk^^ 

ti released that particle. The panicles released by protease tream>ent can be propagated and subjeaed to 
mother round of selection to further enrich for good protease substrates. Sequences that are retained can 

25 UsobeptopagBedtoenrichforpoorprote.se«.bs,r«es. By sequencing the isolated phage genes a. the end 
of either selection one can identify good and poor substrates for further analysis. 

We chose to focus on the .ubtilisin BPN" variant N62D/G166D as it was slightly better .< 
discrimimmng the synthetic dibasic subsmiics 6om the others. We subjected the substrate phage library to 
nine rounds of selection with the subtilisin variant «,d isolated clones that were either increasingly sensitive 

30 orresist-ntocleavage. Of twenty-one clones sequenced from the sensitive pool eighteen contained dibasic 
residues, eleven of which had the substrate linker sequence Asn-Uu-Met-Arg-Lys (SEQ ID NO: 3 5) (Table 



6). 
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■TABLED 

Suhsxrste phage sequences semttve or ^esissant to N62D/CJ66D SMbtWslnftvm a CG-xxxxx^CC lihrery 
sfier 9 rounds ofselecSion°. 

Protease Sensitive Pool 

N L T A a(3) W U M a K(ll) 

(SEQ XD WO: 34) £SEQ ID NO: 35 

(SEQ KO: 36 

t T R a S 
{SEQ ID NO: 37 

A L S S K 
(SEQ ID NO: 3B1 

t M X. a K 
(SEQ ID NO: 39} 



Proteose ResistoBt Pool 



A S T H F 
(SEQ ID NO: 40) 

10 I Q Q Q Y 

(SEQ ID NO; 43) 

Q G E X- P 
(SEQ ID KO: 47) 

A P D P 
15 (SEQ ID NO: 46) 

Q L L E H 
(SEQ ID NO: 47) 

V N N N H 
(SEQ ID NO: 46) 

20 A Q S N X- 

(SEQ ID NO: 49) 



Monobasic Sites 

Q K P N F 
(SEQ ID NO: 41) 



^ P G A H 
(SEQ ID NO: 



44) 



a JS P T H 
(SEQ ID NO: 



42) 



^ Numbers in parentheses indicate the number of times a particular DN A sequence was isolated. 



"Riree (3) of the sensitive sequences were monobasic. Asn-Uu-Thr-Ala-Arg (SEQ ID NO: 34). !t is 
Icnown thai subtilisin has a preference for hydrophobic residues at the P4 position. If these and the other 
25 selected substrates were indeed cleaved after the last basic residue they all would have a Leu, Met or Ala at 
the P4 poshton. Almost no basic residues were isolated in the protease resistant pool and those that were had 
a Pro following the mono- or dibasic residue. It is known thai subtilisin does not cleave substrates containing 
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Pro at the ?V position (Carter. P.. Nilsson. B.. Bumier. J. Burdick. D. and Wells. J. A. (1989) Prcieins: 
Struct.. Funcl, Genet 6:240-248). Thus, di-basic substrates where highly selected and these had the 
additional feature of Leu, Met or Ala at the P4 position. 

Ezomple 5 

g Cleavage of Substrate Linkers 

We wished to analyze how efficiently the most frequently selected sequences were cleaved in the 
context of a fusion protein. For this we applied an alkaline phosphaiase-fiision protein assay (Matthews, D. 
J., Goodman, L. J., Gorman. C. M. and Wells, J. A. (1994)Pro/ei>: Science 3:1 197-1205 and Matthews, D. 
J and Wells. J. A. (1993>Sciertce 260:1 1 13-1 1 17). The hGH substrate linker domains were excised from the 

10 phage vector by PCR and fused in front of the gene for £. coli AP. The fusion protein was expressed and 
purified on an hGH receptor affinity column. The fusion protein was bound to the hGH receptor on a plate 
and treated with the subtilisin variant. The rate of cleavage of the fusion protein from the plate was monitored 
by collcrting soluble fractions as a function of time and assaying for AP activity (Figure 5). The most 
fr^juently isolated substrate sequence, Asn-Leu-Met-Arg-Lys (SEQ ID NO: 35) was cleaved about ten times 

IS faster than the next most frequenUy isolated clones (Thr.AU.Ser.Arg.Arg (SEQ ID NO; 36) and Asn-Uu- 
Thr-Ala-Aig (SEQ ID NO: 34). The cleaved AP products were also recovered and subjected to N-terminal 
sequencing to determine the sites of cleavage (Figure 5). cleavage site denoted by I). In all three fusion 
proteins, this site was immediately follovnng the dibasic or monobasic site according to Ae mutant subtilisin 
design. We also tested the dibasic sequence isolated from the resistant pool, namely Arg-Lys-Pro-Thr-His 

20 (SEQ ID NO: 42). We observed no detectable cleavage above background for this substrate during the 
assay. 



The present invention has of necessity been discussed herein by reference to certain specific methods 
and maxeriak. It is to be understood that the discussion of these specific methods and materials in no way 
2S constitutes any limitation on the scope of the present invention, which extends to any and all alternative 
materials and methods suitable for accomplishing the ends of the present invention. 



All references cited herein are expressly incorporated by reference. 
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SEQUENCE LISTING 



PCT/US96/02861 



(IJ GENERAL INFORMATION: 

(i) APPLICANT: Genentech, Inc. 

(ii) TITLE OF INVENTION: SOBTILISIN VARIANTS CAPABLE" OF CLEAVING 
5 SUBSTRATES CONTAINING BASIC RESIDUES 

(iiiJ NUMBER OF SEQUENCES: 89 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genentech, Inc, 

(B) STREET: 460 Point San Bruno Blvd 
3.0 tC) CITY: South San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 
(FJ ZIP: 94080 

(V) COMPUTER READABLE FORM: 
15 (A) MEDIUM. TYPE: 3.5 inch, 1.44 Mb floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: WinPatin (Genentech) 

tvi) CURRENT APPLICATION DATA: 
20 (A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/398028 
25 (B) FILING DATE: 03-MAR-1995 

tviii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Kubinec, Jeffrey S. 

(B) REGISTRATION NUMBER: 36,575 

(C) REFERENCE/DOCKET NUMBER: P0936P1PCT 

30 (ix) TELECOMMUNICATION INFORMATION: 
(AJ TELEPHONE: 415/225-8228 
(Bl TELEFAX: 415/952-9881 
(C) TELEX: 910/371-7168 

(2) INFORMATION FOR SEQ ID N0:1: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8119 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION; SEQ ID N0:1: 



GAATTCNGGT CTACTAAAAT ATTATTCCAT ACTATACAAT TAATACACAG SO 

AATAATCTGT CTATTGGTTA TTCTGCAAAT GAAAAAAAGG AGAGGATAAA 100 

GA GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 138 
Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe 
45 -107 -105 -100 

6CT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 177 
Ala Leu Ala Leu He Phe Thr Met Ala Phe Gly Ser Thr 



10 



wo 96/27671 

-95 -90 -85 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 216 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
-80 -''S ''^^ 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 255 
Lvs Tyx lie Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 294 
Ser Ala Ala Lys Lys Lys Asp Val lie Ser Glu Lys Gly 
-55 -SO -45 

GGG AAA GTG CAA AAG CAA TTC AAA. TAT GTA GAC GCA GCT 333 
Glv Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 37 2 
15 Ser^ Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 411 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
-15 -10 -S 

20 GCA CAT GCG TAC GCG CAG TCC GTG CCT TAC GGC GTA TCA 450 
Ala His Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser 
1 5 

CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 489 
Gin lie Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
25 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 528 
Glv Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 56"? 
30 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 606 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

35 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 64 5 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 ■'O 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 684 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
40 75 80 85 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 723 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA TAC AGC TGG ATC ATT AAC GGA ATC GAG TGG 762 
45 Ser Gly Gin Tyr Ser Trp He He Asn Gly He Glu Trp 

105 110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 801 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 
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GGC GGA CCT TCT 6GT TCT GCT GCT TTA AAA GCG GCA GTT 840 
Gly Giy Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 " 135 

. GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA B"7 9 
5 Asp lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 918 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 . 160 165 

10 GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 95"? 
Asp Tyr Pro Gly Lys Tyr Pro Ser Val lie Ala Val Gly 
no 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 996 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
15 ISO 185 190 

GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 1035 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 1074 
20 lie Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
205 ' 210 215 

AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1113 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
i 220 225 230 

25 GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1152 
Ala Ala Leu lie Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1191 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
30 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1230 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie Asn 
260 265 



35 



40 



45 



GTA CAG GCG 
Val Gin Ala 
270 


GCA GCT CAG TA AAACATAAAA AACCGGCCTT 1270 
Ala Ala Gin 
275 




GGCCCCGCCG 


GTTTTTTATT 


ATTTTTCTTC 


CTCCGCATGT 


TCAATCCGCT 


1320 


CCATAATCGA 


CGGATGGCTC 


CCTCTGAAAA 


TTTTAACGAG 


AAACGGCGGG 


1370 


TTGACCCGGC 


TCAGTCCCGT 


AACGGCCAAG 


TCCTGAAAC6 


TCTCAATCGC 


1420 


CGCTTCCCGG 


TTTCCGGTCA 


GCTCAATGCC 


GTAACGGTCG 


GCGGCGTTTT 


1470 


CCTGATACCG 


GGAGACGGCA 


TTC6TAATCG 


GATCCGGAAA 


TTGTAAACGT 


1520 


TAATATTTTG 


TTAAAATTCG 


CGTTAAATTT 


TTGTTAAATC 


AGCTCATTTT 


1570 


TTAACCAATA 


GGCCGAAATC 


GGCAAAATCC 


CTTATAAATC 


AAAAGAATAG 


1620 


ACCGAGATAG 


GGTTGAGTGT 


TGTTCCAGTT 


TGGAACAAGA 


GTCCACTATT 


1670 


AAAGAACGTG 


GACTCCAACG 


TCAAAGGGCG 


AAAAACCGTC 


TATCAGGGCT 


1720 



-31- 



PCTAJS94/02S61 

ATGGCCCACT ACGTGAACCA .TCACCCTAAT CAAGTTTTTT GGGGTCGAGG mO 
TGCCGTAAAG CACTAAATCG GAACCCTAAA GGGAGCCCCC GATTTAGAGC 1820 
TTGACGGGGA AAGCCGGCGA ACGTGGCGAG AAAGGAAGGG AAGAAAGCGA 1870 
AAGGAGCGGG CGCTAGGGCG CTGGCAAGTG TAGCGGTCAC GCTGCGCGtA 1920 
5 ACCACCACAC CCGCCGCGCT TAATGCGCCG CTACAGGGCG CGTCCGGATC IS^O 
NGATCCGACG CGAGGCTGGA TGGCCTTCCC CATTATGATT CTTCTCGCTT 2020 
CCGGCGGCAT CGGGATGCCC GCGTTGCAGG CCATGCTGTC CAGGCAGGTA 2C70 
GATGACGACC ATCAGGGACA GCTTCAAGGA TCGCTCGCGG CTCTTACCAG 2120 
CCTAACTTCG ATCACTGGAC CGCTGATCGT CACGGCGATT TATGCCGCCT 2170 
10 CGGCGAGCAC ATGGAACGGG TTGGCATGGA TTGTAGGCGC CGCCCTATAC 2220 
CTTGTCTGCC TCCCCGCGTT GCGTCGCGGT GCATGGAGCC GGGCCACCTC 2270 
GACCTGAATG GAAGCCGGCG GCACCTCGCT AACGGATTCA CCACTCCAAG 2320 
AATTGGAGCC AATCAATTCT TGCGGAGAAC TGTGAATGCG CAAACCAACC 2370 
CTTGGCAGAA CATATCCATC GCGTCCGCCA TCTCCAGCAG CCGCACGCGG 2420 
15 CGCATCrCGG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG 2470 
ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA 2520 
GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC 2570 
TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT 2620 
CGGGAAGCGT GGCGCTTTCT CAATGCTCAC GCTGTAGGTA TCTCAGTTCG 2670 
20 GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA 2720 
GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG 2770 
TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC 2820 
AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA 2870 
CTACGGCTAC ACTAGAAGGA CAGTATTTGG TATCTGCGCT CTGCTGAAGC 2 920 
25 CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC 2970 
ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG 3020 
AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG 3070 
CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA 3120 
AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC 3170 
30 AATCTAAA6T ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA 3220 
TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT 3270 
GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC 3320 
TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG 3370 
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ATTTATCAGC AATAAACCAG CCAGCCGGAA 

CCTGCAACTT TATCCGCCTC CATCCAGTCT 

TAGAGTAAGT AGTTCGCCAG TTAATAGTTT 

CTGCAGGCAT CGTGGTGTCA CGCTCGTCGT 

5 TCCG6TTCCC AACGATCAAG GCGAGTTACA 

AAAAGCGGTT AGCTCCTTCG GTCCTCCGAT 

CCGCAGT6TT ATCACTCATG GTTATGGCAG 

GTCAT6CCAT CCGTAAGATG CTTTTCTGTG 

GTCATTCTGA GAATAGTGTA TGCGGCGACC 

10 CAACACGGGA TAATACCGCG CCACATAGCA 
ATT6GAAAAC GTTCTTCGGG GCGAAAACTC 
GAGATCCAGT TCGATGTAAC CCACTCGTGC 
CTTTTACTTT CACCAGCGTT TCTGGGTGAG 
GCCGCAAAAA AGGGAATAAG GGCGACACGG 

15 CTTCCTTTTT CAATATTATT GAAGCA7TTA 
GCGGATACAT ATTTGAATGT ATTTAGAAAA 
CGCACATTTC CCCGAAAAGT GCCACCTGAC 
CATGACATTA ACCTATAAAA ATAGGCGTAT 
AAGAATTAAT TCCTTAAGGA ACGTACAGAC 

20 CGTTTTTAAG GGGTTTGTAG ACAAGGTAAA 
AAGAAAAACA CGATTTAGAA CCTAAAAAGA 
AACCGAGAGG TAAAAAAAGA ACGAAGTCGA 
AAATAAAAAA AGCACCTGAA AAGGTGTCTT 
GTTCTTTCTT ATCTTGATAC ATATAGAAAT 

25 TGCTGAAAGG TGCGTTGAAG TGTTGGTATG 
AAACCCTTAA AATTGGTTGC ACAGAAAAAC 
GTGACTAAAC AAATAACTAA ATAGATGGGG 
TCCTAATAGT AGCATTTATT CAGATGAAAA 
AGACAAAAAG TGGAAAAGTG AGACCATGGA 

30 GTTGATTACT TTGAACTTCT GCATATTCTT 
A6TAAAAGAT TGTGCTGAAA TATTAGAGTA 
GCGAAAGAAA GTTGTATCGA GTGTGGTTTT 
ATGTGCAACT GGAGGAGAGC AATGAAACAT 



GGGCCGAGCG 


CAGAAGTGGT 


3420 


ATTAATTGTT 


GCCGGGAAGC 


347C 


GCGCAACGTT 


GTTGCCA7TG 


3520 


TTGGTATGGC 


TTCATTCAGC 


3570 


TGATCCCCCA 


TGTTGTGCAA 


3620 


CGTTGTCAGA 


AGTAAGTTGG 


3670 


CACTGCATAA 


TTCTCTTACT 


3720 


ACTGGTGAGT 


ACTCAACCAA 


3770 


GAGTTGCTCT 


TGCCCGGC6T 


3820 


GAACTTTAAA 


AGTGCTCATC 


3870 


TCAAGGATCT 


TACCGCTGTT 


3920 


ACCCAACTGA 


TCTTCAGCAT 


3970 


CAAAAACAGG 


AAGGCAAAAT 


4020 


AAATGTTGAA 


TACTCATACT 


4070 


TCAGGGTTAT 


TGTCTCATGA 


4120 


ATAAACAAAT 


AGGGGTTCCG 


4170 


GTCTAAGAAA 


CCATTATTAT 


4220 


CACGAGGCCC 


TTTCGTCTTC 


4270 


GGCTTAAAAG 


CCTTTAAAAA 


4320 


GGATAAAACA 


GCACAATTCC 


4370 


ACGAATTTGA 


ACTAACTCAT 


4420 


GATCAGGGAA 


TGAGTTTATA 


.4470 


TTTTTGATGG 


TTTTGAACTT 


4520 


AACGTCATTT 


TTATTTTAGT 


4570 


TATGTGTtTT 


ft ft ft >f*<f*^ft 

AAAGTATTGA 


4620 


CCCATCTGTT 


AAAGTTATAA 


4670 


GTTTCTTTTA 


ATATTATGTG 


4720 


ATCAAGGGTT 


TTAGTGGACA 


4770 


GAGAAAAGAA 


AATCGCTAAT 


4820 


GAATTTAAAA 


AGGCTGAAAG 


4870 


TAAACAAAAT 


CGTGAAACAG 


4920 


GTAAATCCAG 


GCTTTGTCCA 


4970 


GGCATTCAGT 


CACAAAAGGT 


5020 
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TGTTGCTGAA GTTATTAAAC AAAAGCCAAC AGTTCGTTGG TTGTTTCTCA 5070 
CATTAACAGT TAAAAATGTT TATGATGGCG AAGAATTAAA TAAGAGTTTG 5120 
TCAGATATGG CTCAAGGATT TCGCCGAATG ATGCAATATA AAAAAATTAA 5170 
TAAAAATCTT GTTGGTTTTA TGCGTGCAAC GGAAGTGACA ATAAATAATA 5220 
5 AAGATAATTC TTATAATCAG CACATGCATG TATTGGTATG TGTGGAACCA 5270 
ACTTATTTTA AGAATACAGA AAACTACGTG AATCAAAAAC AATGGATTCA 5320 
ATTTTGGAAA AAGGCAAT6A AATTAGACTA TGATCCAAAT GTAAAAGTTC 5370 
AAATGATTCG ACCGAAAAAT AAATATAAAT CGGATATACA ATCGGCAATT 5420 
GACGAAACTG CAAAATATCC TGTAAAGGAT ACGGATTTTA TGACCGATGA 5470 
10 TGAAGAAAAG AATTTGAAAC GTTTGTCT6A TTTGGAGGAA GGTTTACACC 5520 
GTAAAAGGTT AATCTCCTAT GGTGGTTTGT TAAAAGAAAT ACATAAAAAA 5570 
TTAAACCTTG~ATGACACAGA AGAAGGCGAT TTGATTCATA CAGATGATGA 5620 
CGAAAAAGCC GATGAAGATG GATTTTCTAT TATTGCAATG TGGAATTGGG 5670 
AACGGAAAAA TTATTTTATT AAAGAGTAGT TCAACAAACG GGCCAGTTTG 5720 
15 TTGAAGATTA GATGCTATAA TTGTTATTAA AAGGATTGAA GGATGCTTAG 5770 
GAAGACGAGT TATTAATAGC TGAATAAGAA CGGTGCTCTC CAAATATTCT 5820 

TATTTAGAAA AGCAAATCTA AAATTATCTG AAAAGGGAAT GAGAATAGTG 5870 
AATGGACCAA TAATAATGAC TAGAGAAGAA AGAATGAAGA TTGTTCATGA 5920 
AATTAAGGAA CGAATATTGG ATAAATATGG GGATGATGTT AAGGCTATTG 5970 
20 GTGTTTATGG CTCTCTTGGT CGTCAGACTG ATGGGCCCTA TTCGGATATT 6020 
GAGATGATGT GTGTCATGTC AACAGAGGAA GCAGAGTTCA GCCATGAATG 6070 
GACAACCGGT GAGTGGAAGG TGGAAGTGAA TTTTGATAGC GAAGAGATTC 6120 
TACTAGATTA TGCATCTCAG GTGGAATCAG ATTGGCCGCT TACACATGGT 6170 
CAATTTTTCT CTATTTTGCC GATTTATGAT TCAGGTGGAT ACTTAGAGAA 6220 
25 AGTGTATCAA ACTGCTAAAT CGGTAGAAGC CCAAACGTTC CACGATCCGA 6270 
TTTGTGCCCT TATCGTAGAA GAGCTGTTTG AATATGCAGG CAAATCGCGT 6320 
AATATTCGTG TGCAAGGACC GACAACATTT CTACCATCCT TGACTGTACA 6370 
GGTAGCAATG GCAGGTGCCA TGTTGATTGG TCTGCATCAT CGCATCTGTT 6420 
ATACGACGAG CGCTTCGGTC TTAACTGAAG CAGTTAAGCA ATCAGATCTT 6470 
30 CCTTCAGGTT ATGACCATCT GTGCCAGTTC GTAATGTC7G GTCAACTTTC 6520 
CGACTCTGAG AAACTTCTGG AATCGCTAGA GAATTTCTGG AATGGGA7TC 6570 
AGGAGTGGAC AGAACGACAC GGATATATAG TGGATGTGTC AAAACGCATA 6620 
CCATTTTGAA CGATGACCTC TAATAATTGT TAATCATGTT GGTTACGTAT 6670 
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TTXTTAACTT 


CTCCTAGTAT 


TAGTAATTAT 


CATGGCTGTC 


ATGGCGCATT 


6720 


AACGGAATAA 


AGGGTGTGCT 


TAAATCGGGC 


CATTTTGCGT 


AATAAGAAAA 


6770 


AGGATTAATT 


ATGAGCGAAT 


TGAATTAATA 


ATAAGGTAAT AGATTTACAT 


6820 


TAGAAAATGA 


AAGGGGATTT 


TATGCGTGAG 


AATGTTACAG 


TCTATCCCGG 


6870 


CAATAGTTAC 


CCTTATTATC 


AAGATAAGAA 


AGAAAAGGAT 


TTTTCGCTAC 


6920 


GCTG^AATCC 


TTTAAAAAAA 


CACAAAAGAC 


CACATTTTTT 


AATGTGGTCT 


6970 


TTATTCTTCA 


ACTAAAGCAC 


CCATTAGTTC 


AACAAACGAA AATTGGATAA 


7020 


AGTGGGATAT 


TTTTAAAATA 


TATATTTATG 


TTACA6TAAT 


ATTGACTTTT 


7070 


AAAAAAGGAT 


TGATTCTAAT 


GAAGAAAGCA 


GACAAGTAAG 


CCTCCTAAAT 




TCACTTTAGA 


TAAAAATTTA 


GGAGGCATAT 


CAAATGAACT 


TTAATAAAAT 


7170 


TGATTTAGAC 


AATTGGAAGA 


GAAAAGAGAT 


ATTTAATCAT 


TATTTGAACC 


' £ £ V 


AACAAACGAC 


TTTTAGTATA 


ACCACAGAAA 


TTGATATTAG 


TGTTTTATAC 


1 £, l\J 


CGAAAGATAA 


AACAAGAAGG 


ATATAAATTT 


TACCCTGCAT 


TTATTTTCTT 




AGTGACAAGG 


GTGATAAACT 


CAAATACAGC 


TTTTAGAACT 


GGTTACAATA 




GCGACGGAGA 


GTTAGGTTAT 


TGGGATAAGT 


TAGAGCCACT 


TTATACAATT 




TTTGATGGTG 


TATCTAAAAC 


ATTCTCTGGT 


ATTTGGACTC 


CTGTAAAGAA 


7470 


TGACTTCAAA 


GAGTTTTATG 


ATTTATArrr 


TTCTGATGTA 


GAGAAATATA 










CTATACCTGA AAATGCTTTT 


ID 1\J 






wrtW 1 1 wi X i 1 


ACTGGGTTTA 


ACTTAAATAT 




CAATAATAAT 






TATTACAGCA 


GGAAAATTCA 




TTAATAAAGG 


TAATTCAATA 




TATCTTTACA 


GGTACATCAT 




TCTGTTTGTG 


ATGGTTATCA 


TGCAGGATTG 


TTTATGAACT 


CTATTCAGGA 


7770 


ATTGTCAGAT 


AGGCCTAATG 


ACTGGCTTTT 


ATAATATGAG 


ATAATGCCGA 


7820 


CTGTACTTTT 


TACAGTCGGT 


TTTCTAATGT 


CACTAACCTG 


CCCCGTTAGT 


7870 


TGAAGAAGGT 


TTTTATATTA 


CAGCTCCAGA 


TCCATATCCT 


TCTTTTTCTG 


7920 


AACCGACTTC 


TCCTTTTTCG 


CTTCTTTATT 


CCAATTGCTT 


TATTGACGTT 


7970 


GAGCCTCGGA 


ACCCNTATAG 


TGTGTTATAC 


TTTACTTGGA AGTGGTTGCC 


8020 


GGAAAGAGCG 


AAAATGCCTC 


ACATTTGTGC 


CACCTAAAAA 


GGAGCGATTT 


8070 


ACATATGAGT 


TATGCAGTTT 


GTAGAATGCA 


AAAAGTGAAA 


TCAGGATCN 8119 



3D 12)^ INfORMATION FOR SEQ ID NO: 2: 

ti) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Val Arg Gly Lys Lys Val Trp 'He Ser Leu Leu Phe Ala Leu Ala 
-101 -105 -100 -95 

Leu lie Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
5 -90 . -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr lie Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
-60 -55 -50 

10 Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

Lvs Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Ala His 
15 -15 -5 

Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
15 20 25 

20 Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
25 60 65 70 

Ala Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala 
75 80 85 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
90 95 100 

30 Tyr Ser Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn 
105 110 115 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 130 

Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
35 135 140 145 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 
150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
165 170 . 175 

40 Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
180 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 
195 200- 205 
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Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
2:0 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 
225 230 235 

5 Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
255 ^ 260 265 

Asn Val Gin Ala Ala Ala Gin 
10 270 275 

(2) INFORMATION FOR SEQ ID NO: 3: " 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE. DESCRIPTION: SEQ ID NO: 3: 

Ser Leu Gly Gly Pro Ser Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 4: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

25 Ala Ala Ala Gly Asn Glu Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Ser Thr Val Gly Tyr Pro 
1 5 6 

35 (2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ser Trp Gly Pro Ala Asp Asp 
15 7 

(2) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 amino acids 

(B) TYPE: Amino Acid ' 
(D) TOPOLOGY: Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:T: 

Phe Ala Ser Gly Asn Gly Gly 



10 



15 



1 



<2) INFORMATION FOR^ SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Cys Asn Tyr Asp Gly Tyr Thr 
1- S 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 1 amino acids 
(B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Ser Trp Gly Pro Glu Asp Asp 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 10: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

30 Trp Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 11: 

ti) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Cys Asn Cys Asp Gly Tyr Thr 
1 5 

40 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear . 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

Trp Ala Ser Gly Asp Gly Gly 
1 5 1 

(2) INFORMATION FOR SEQ ID NO: 13: 

5 U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGYj Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

10 Cys Asn Cys Asp Gly Tyr Ala 

1 5 7' 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Val lie Asp Ser Gly lie 
1 5 6" 

20 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: S amino acids 

(B) TYPE: Amino Acid 
£D) TOPOLOGY: Linear 

25 (xij SEQUENCE DESCRIPTION: SEQ ID N0:15: 

Asp Asn Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID N0:16: 

U) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 6 amino acids 

(B) TYPE: Amino Acid 
(DJ TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

lie Val Asp Asp Gly Leu 
35 1 5 6 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
-SO (Dl TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Ser Asp Asp Tyr His 
1 5 
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(2) INFORMATION FOR SEQ ID N0:18: 
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ii) SEOUEKCE CHARACTERISTICS: 

(A) LENGTH:' 6 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEO ID NO: IB: 

He Leu Asp Asp Gly He 
1 ,5 6 

12) INFORMATION FOR SEQ ID NO: 19: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY; Linear 

(xi) SEQUENCE DESCRIPTION! SEQ ID N0:19: 

15 Asn Asp Asn Arg His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 20: 

(iJ SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

He Met Asp Asp Gly He 
1 5 6 

25 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 

Tip Phe Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 21 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



<0 GCGCTTATCG ACGACGGTAT CGATTCT 27 

(2) INFORMATION FOR SEQ ID NO;23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
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GCGCTTAT 

,„ INFOBMATION FOB SEQ 
U1 SEQUENCE CHAKAC^^^^ 

,,,, SEOOE«CE .ESC«nXOS: SEQ 

ursEQ.t«cECH;^".Sr^^^^^^ 

^,11 SoEDNESS: Single 
^.^ TOPOLOGY. L^n«« 

j^i, SEQUENCE DESCRIPT 

" SEQUENCE CH.^",^^^^^^^ 

lAl l-ENf «' tLic Acid 
S ISoEdSII: Single 

D Topology: linear 

jxi) SEQOEUCE DESCWPT 
CCAAGACAAC ..A-TCACG GAA 

XKrOBHAtXC. TOK SEQ XO K0.2B. 

40 -Al- 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear ' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 



CACTTCCGGC AGCTCGTCGA CAGTGGACTA CCCTGGCAAA TA 42 
(2) INFORMATION TOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single . 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



CACTTCCGGC AGCTCGTCGA CAGTGGAGTA CCCTGGCAAA TA 4 2 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



TTAACATGAG CCTCGGCCCA GCTAGCGGTT CTGCTGCTTT A 41 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



TTAACATGAG CCTCGGCCCC GCGGATGATT CTGCTGCTTT AAA 4 3 
30 (2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4*7 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



CGGCAGCTCA AGCAACGATG GCTATCCTGG CAAATACCCT TC 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 44 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear * 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



S ACTTCCGGCA GCTCTTCGAA CTACGACGGG TACCCTGGCA AATA 44 

12) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHXkACTERlSTICS: 
(A) LENGTH: 5 amino acids 
IB) TYPE: Amino Acid 
10 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Asn Leu Thr Ala Arg 
1 5 

12) INFORMATION FOR SEQ ID NO:35: 

IS (i) SEQUENCE CHARACTERISTICS: 

ik) LENGTH: 5 amino acids 
IB) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

30 Asn Leu Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: S amino acids 
25 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Thr Ala Ser Arg Arg 
1 5 

30 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Leu Thr Arg Arg Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 
«0 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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Ala Leu Ser Arg Lys 
1 5 

(2) INFORMATION FOR SEO ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Leu Met Leu Arg Lys 
10 1 ^ 

(2) INFORMATION FOR SEQ ID NO: 40: ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
3^5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Ala Ser Thr His Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 41: 

20 (i) SEQUENCE CHARACTERISTICS: 

I A) LENGTH: 5 amino acids 
(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

25 Gin Lys Pro Asn Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 42: 

(ij SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Arg Lys Pro Thr His 
1 5 

35 (2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43 

lie Gin Gin Gin Tyr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 44: 
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SEQUENCE DESCRIP-I 

1,-r. Pro Thr 
Ala pro ASP 5 

^1 Tvpr- AiTvino 
„ Gl» "» "° "1 

SEQUENT OESCRII- 
val Ast, Asn Asn His 

^fp^'Vino Acid 
TOPOWGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEO ID NO: 49: 

Ala Gin Ser Asn Leu 

1 . 5 

(2) INFORMATION FOR SEQ ID NO: 50: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY^ Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:50: 

10 Thr Ala Ser Arg Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
25 (B) TYPE: Amino Acid 

(D)" TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:51: 

His His His His His His 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Leu Met Arg Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Leu Thr Ala Arg 
35 1 < 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 

Gly Pro Gly Gly 
1 4 
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(2) INFORMATION FOR SEO -ID N0:5S: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
5 tO) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID KO;55: 

Gly Leu Met Arg Lys 

1 . 5 

(2) INFORMATION FOR SEQ ID NO: 56: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids • 

(B) TYPE: Amino Acid 
{DJ TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

IS Ala Ala Pro Phe 
I A 

iZ) INFORMATION FOR SEQ ID NO:57: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 

Gly Pro Gly Gly Xaa Xaa Xaa Xaa Xaa Gly Gly Pro Gly 
i 5 10 13 

25 (2) INFORMATION FOR SEQ ID NO; 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 amino acids 
tB) TYPE: Amino Acid 

tD) TOPOLOGY; Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

Ala Ala Pro Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 59; 

{i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0;59: 

Ala Ala Pro Arg 
40 1 4 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
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(D) TOPOLOGY: Linear 
(xi) SEQUENCE DESCRIPTION: SEO ID NO: 60: 

Ala Ala Pro Met 

1 4 ■ . - 

5 (2) INFORMATION FOR SEQ ID N0:€1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D). TOPOLOGY: Linear 

10 ixi) SEQUENCE DESCRIPTION: SEQ IDN0:61: 

Ala Ala Pro Gin 
1 < 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 
3^5 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Ala Ala Lys Phe 
20 1 < 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
25 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Ala Ala Ala Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 64: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 

35 Ala Ala Arg Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
40 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
Ala Ala Asp Phe 
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1 4 

(2) INFORHATIOK FOR SEQ ID NO: 66: 

(ij SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
5 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

Ui) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Ala Ala Lys Lys 
1 4 

ao (2) INFORMATION FOR SEQ ID NO: 67: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

IS (xi) SEQUENCE DESCRIPTION; SEQ ID N0:€7: 

Ala Ala Lys Arg 
1 4 

{2} INFORMATION FOR SEQ ID NO: 68: 

ti) SEQUENCE CHARACTERISTICS: 
20 tA) LENGTH: 4 amino acids 

tB) TYPE: Amino Acid 
tD) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

Ala Ala Lys Phe 
25 1 4 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 
{Al LENGTH: 4 amino acids 
(B) TYPE: Amino Acid 
30 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Ala Ala Pro Xaa 
1 4 

(2) INFORMATION FOR SEQ ID NO:70: 

35 (i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

<30 Ala Ala Xaa Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 71: 

fi) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:71: 

5 Ala Ala Xaa Xaa Xaa 
1 ^ 

(2) INFOBMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 275 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:72: 

Ala Gin Ser Val Pro Tyr Gly Val Scr Gin lie Lys Ala Pro Ala 
1 5 10 

15 Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val Ala Val 

20 25 •s" 

lie Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys Val Ala 
35 40 

Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp 
20 .50 55 60 

Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala Leu 
65 ■'O 

Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala Ser Leu 
80 65 

25 Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin Tyr Ser 

95 100 

Trp lie lie Asn Gly He Glu Trp Ala He Ala Asn Asn Met Asp 
110 115 1''" 

Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala Ala Leu 
30 125 130 '■^^ 

Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val Val Val 
140 1^5 

Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

35 Asp Tyr Pro Gly Lys Tyr Pro Ser Val lie Ala Val Gly Ala Val 

no 

Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly Pro Glu 
IBS 190 1»5 

Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr Leu Pro 
40 200 205 210 

Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala Ser Pro 
215 220 

His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His Pro Asn 
230 235 2flu 
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Trp Thr Asn Thr Gin Val Aig Sex Ser Leu Glu Asn Thr Thr Thr 
245 250 255 

Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu lie Asn Val 
260 265 270 

5 Gin Ala Ala Ala Gin 
275 

(2) INFOPMATION TOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 
{A} LENGTH: 4 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

Arg Val Arg Arg 
1 4 

as 12) INFORMATION FOR SEQ ID N0:7<: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 6 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
20 (D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 



GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 36 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe 
-107 -105 -100 

25 GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 75 
Ala Leu Ala Leu He Phe Thr Met Ale Phe Gly Ser Thr 
-95 -90 -85 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 114 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
30 -80 -75 -70 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 153 
Lys Tyr He Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 192 
3S Ser Ala Ala Lys Lys Lys Asp Val He Ser Glu Lys Gly 
-55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 231 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

AO TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 270 
Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 309 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
45 -15 -10 -5 

AGA CAT AAG CGC GCG CAG TCC GTG CCT TAC GGC GTA TCA 348 
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Arg His Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser 



1 



CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAG ACT 387 
Gin lie Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
5 10 ^5 

r-A TCA AAT GTT AAA GTA GCG GTT ATC GAG AGC GGT ATC 426 
Gly ser Asn Val Lys Val Ala Val lie Asp Ser Gly lie 
25 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 4 65 
10 ASP Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 

40 4S 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 504 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

!L5 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 543 
ASP Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 ''O 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 562 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
20 75 80 B5 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 621 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 1-00 

TCC GGC CAA GAT AGC TGG ATC ATT AAC GGA ATC GAG TGG 660 
25 Ser Gly Gin Asp Ser Trp He He Asn Gly He Glu Trp 

105 -110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 699 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 

30 GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 738 
Gly Gly Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA HI 
Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
35 140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 816 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 855 
40 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 

170 1^5 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 894 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
180 185 . 190 

45 GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 933 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 972 
He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 



-52- 



PCTAJS96/02S61 



205 210 215 

AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1011 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

S GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1050 
Ala Ala Leu He Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1089 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
10 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1128 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn 
260 265 

GTA CAG GCG GCA GCT CAG 1146 
15 Val Gin Ala Ala Ala Gin 
270 275 

12} INFORMATION FOR SEO ID NO: 75: 

ii) SEQUENCE CHARACTERISTICS: 
tA) LENGTH: 382 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

Ui) SEQUENCE DESCRIPTION: SEQ ID NO:75: 

Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe Ala Leu Ala 
-107 -105 -100 -95 

25 Leu He Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
-90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
30 -60 -55 -50 

Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 .40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

35 Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Arg His 
-15 -10 -5 

Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
<iO 15 20 25 

Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

45 Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
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60 

Ala Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala 



65 ■'0 



ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
5 90 95 

ASP ser Trp He He Asn Gly He Glu Trp Ala lie Ala Asn Asn 
105 110 Al5 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 
120 125 

10 Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 

135 

val Val Ala Ala Ala Gly Asn Giu Gly Thr Ser Gly Ser Ser Ser 
150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val lie Ala Val Gly 
15 165 I'^O 175 

Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
IBO 185 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser lie Gin Ser Thr 
195 200 205 

20 Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
210 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 
225 230 235 

Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
25 240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
255 260 265 

Asn Val Gin Ala Ala Ala Gin 
270 275 

30 (2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 
{B) .TYPE: Amino Acid 

ID) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

Asn Arg Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
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Gly Ser Gly Gin Tyr Sesr Trp lie He Asn Gly 
1 5 10 ai 

12) INFORMATION FOR SEO ID N0:7B: 

ti) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: li amino acids 

(B) TYPE: Aioino Acid 
(D) TOPOtOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID N0:7e: 

Gly Asp He Thr Thr Glu Asp Glu Ala Ala Ser 
10 1 5 10 11 

12) INFORMATION FOR SEQ ID N0:79:* 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 9: 

Gly Glu Val Thr Asp Ala Val Glu Ala Arg Ser 
1 5 10 11 



(2) INFORMATION FOR SEQ ID NO: 60: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

Ixi) SEQUENCE DESCRIPTION: SEQ ID NO: BO: 

25 Pro Phe Met Thr Asp He He Glu Ala Ser Ser 
1 5 10 11 



(2) INFORMATION FOR SEQ ID NO: 81: 



li) SEQUENCE CHARACTERISTICS: 
(Al LENGTH: 11 amino acids 
3D tB) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 



txi) SEQUENCE DESCRIPTION: SEQ ID N0:B1: 

Gly He Val Thr Asp Ala He Glu Ala Ser Ser 
1 5 10 11 



35 12) INFORMATION FOR SEQ ID N0:e2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
40 (0) TOPOLOGY: Linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 



GGTTCCGGCC AAGATAGCTG GATCATT 27 
t2) INFORMATION FOR SEQ ID NO: 83: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid' 

(C) STRANDEDNESS: Single 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 



CCAATACAGC TGGGAAATTA ACGGAATCG 29 
(2) INFORMATION FOR SEQ ID N0:B4: 

li) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs . 

(B) TYPE: Nucleic Acid 
{C) STRANDEDNESS: Single 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 



15 GGTTCCGGCC AAGATAGCTG GGAAATTAAC G 31 
(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: Nucleic Acid 
20 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 



AAGAAGATCA CGTAAGACAT AAGCGCGCGC 30 
(2) INFORMATION FOR SEQ ID NO: 86: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
30 Arg Ala Lys Arg 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Lys Ala Lys Arg 
1 4 

40 (2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 



-56 



WOW/27671 PCT/US9^2861 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Unear 

(Xi) SEQUENCE DESCRIPTION: SEO ID NO:88: 

Gly Pro Gly Gly Leu Met Arg Lys 
5 1 5 8 

(2) INFORMATION FOR SEO ID NO: 69: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: amino acids 

(B) TYPE: Amino Acid 
10 tD) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

Gly Pro Gly Gly Lys Ala Lys Arg 
1 5 8 
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What is claimed is: 

1. A subtiiisin variant derived from a precursor subtilistn-fypc serine protease said variant 
capable of cleaving a polypeptide substrate comprising the sequence: 

5 OH 

I 1 

P4-P3-P2-P1-C-K-PX' 

wherein; 

P4 is a basic amino acid: 
10 P3 is any amino acid selected from the naturally occurring amino acids; 

P2 is a basic amino acid; 
PI b a basic amino acid; and 
PVisnot Pro. 

2. The subtiiisin variant of claim 1 containing an acidic amino acid at a residue equivalent to 
15 Asn 62, Tyr I W and Gty 166 of the subtiiisin naturally produced by Bacillus amyloliquefacicns, 

3. The subdlisln-type sernie protease variant of claim 2 v/hertin the acidic amino acid is Asp 
orGiu, 

A, The subtDisin-type serine protease variant of claim 3 wherein the acidic amino acid is Asp. 

5. The subdlisin-typc serine protease variant of claim 2 wherein the precursor subtilisin-type 
2 0 serine protease in the subtiiisin naturally produced by Bacilius amylolique/aciens. 

6. The subtiiisin variant of claim 5 having the amino acid sequence of the mature polypeptide 
of Figure 8 (SEO ID NO: 75). 

7. A subtiiisin variant having substrate specificity for peptide substrates conuining dibasic 
amino acid sequences. 

25 8. The subtiiisin variant of claim 7 having a different amino acid residue at residue position -^2 

than subtiiisin naturally produced by Bacillus amyhUgue/aciens. 

9. The subtiiisin variant of Claim S having an Asp or Glu at residue position -^2. 
JO. The subtiiisin variant of Claim 9 having an Asp at residue position *^2. 
1 1. The subtiiisin variant of Claim 10 further having an Asp or Glu at residue position '*-166. 
30 12. The subtiiisin variant of Claim 1 1 having an Asp at residue position -«>166. 

13. The subtiiisin variant of Claim 12 having the amino acid sequence of the mamre polypeptide 
provided in Fig. 6, 

)4« An isolated nucleic acid molecule encoding the subtiiisin variant of Claim 1 . 

15. The nucleic acid molecule of Claim 14 further comprising a promoter operably linked to the 
35 nucleic acid molecule. 

16. An expression vector comprising the nucleic acid molecule of Claim 1 5 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

17. A host cell transformed with the vector of Claim 16. 

IS. An isolated nucleic acid molecule encoding the subtiiisin variant of Claim 7. 
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1 9. The nucleic acid molecule of Claim 1 S further comprising a promoter operably linked lo the 
nucleic acid molecule. 

20. An expression vector comprising the nucleic acid molecule of Claim 19 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

5 21. A host cell transformed with the veaor of Claim 20. 

22. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 
production of the subtilisin vgriant comprising cuhuring the host cell of Claim 21 under conditions suitable 
for expression of the subtilisin variant 

23 . process of Claim 22 fimher comprising recovering the subtilisin variant from the host 
10 cell cuhure medium. 

24. A method of using the subtilisin variant of Claim 1 comprising contacting a fusion protein 
containing a dibasic sequence with the subtilisin variant 

25. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 

15 P4-P3.P2-Pl-Pr 
wherein, 

P4 is a basic amino acid; 

P3 is an amino acid seteaed from the naturally occurring amino acids; 
P2 is a basic amino acid; 
20 PI is a basic amino acid; and 

PI' is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 1 in a reaction mbcture under conditions 
such that the subtilisn variant cleaves the polypeptide. 
25 26. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 

production of the subtilisin variant comprising cuhuring the host cell of Claim I? under conditions suitable 
for expression of the subtilisin vorisnt 

27. The process of Claim 26 further comprising recovering the subtilisin variant from the host 
celt culmre medium. 

30 28. A method of using the subtilisin variant of Claim 7 comprising conucting a fusion protein 

conuining a dibasic sequence with the subtilisin variant 

29. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the foimula: 
P4.P3-P2-Pl.pr 
35 wherein. 

P4 is a large hydrophobic amino acid; 

P3 is an amino acid selected from the namratly occurring amino acids; 
P2 is a basic amino acid; 
PI is a basic amino acid; and 
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pr boot Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 7 in a reaction mixture under conditions 
such that the subtiiisn variant cleaves the polypeptide. 
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